Skip to content

Pairfam Activity Trajectories Dataset

This dataset contains German 1,027 individuals of employment/activity trajectories. It is derived from the German Family Panel (pairfam, Release 14.2) and was pre-processed by the authors of Sequence Analysis (Raab & Struffolino, 2022). It is designed for teaching and learning sequence analysis by providing ready-to-use trajectories of employment status.

We provide two versions of the dataset:

  • Year-level data: 22 yearly observations with state abbreviations
  • Month-level data: 264 monthly observations (ages 18 to 40) with numeric state codes

Important Notes

  • The IDs are different between year-level and month-level data and cannot be directly linked.
  • State encoding differs: Year-level uses text abbreviations (e.g., "EDU", "FT"), while month-level uses numeric codes (1–8).
  • The underlying state definitions remain the same across both versions.

Data origin and processing

  • Source: pairfam, a large-scale longitudinal survey on partnership and family dynamics in Germany.

  • Processing by book authors: Employment and activity status was categorized into 8 distinct states representing different types of labor market participation and non-participation.

  • Our preprocessing: To make the data more convenient to use, we performed a minor preprocessing step, converting state1 ... state264 to 1 ... 264 before adding it to our prepared dataset.

    The data preprocessing function we use is clean_time_columns_auto(). Simply put, it is a smart tool for cleaning column names. Its main purpose is to automatically scan a DataFrame, identify columns with names containing numbers (e.g., state1, wave2, year2023), and then simplify these names to just the numbers they contain (becoming 1, 2, 2023). This feature is particularly useful when processing time-series or panel data, as it allows for the quick standardization of column names that represent different points in time.

    For more details on how we cleaned and prepared the data, see the data cleaning code repository.

  • Result: An 8-state alphabet.

Activity states encoding

Numeric CodeAbbreviationDescription
1EDUEducation
2MIL/CSMilitary/Civil Service
3PTPart-time Employment
4FTFull-time Employment
5SELFSelf-employed
6PLEAVEParental Leave
7MARGINALMarginal Employment
8UNEMPUnemployed

Year-level data

File: pairfam_activity_by_year.csv

This dataset contains 1,029 individuals observed over 22 years. The states are encoded directly as text abbreviations (e.g., "EDU", "FT", "PT", "PLEAVE").

No Covariates

Unlike the month-level data, the year-level data does not include any covariates. The id column contains randomly generated identifiers created during our preprocessing and cannot be linked to other datasets or the month-level data.

Structure

ColumnDescription
idIndividual identifier (simple sequential integers: 194, 896, 284, ..., cannot be linked to other datasets)
122Yearly activity trajectory states, encoded as abbreviations (e.g., "EDU", "FT")

Sample data

id12345678910
194FTPLEAVEFTFTFTFTPLEAVEPLEAVEPLEAVEFT
896EDUEDUMIL/CSFTUNEMPFTFTFTFTFT
284EDUEDUEDUEDUEDUEDUFTFTFTFT
886EDUEDUEDUEDUEDUEDUEDUEDUEDUUNEMP

Month-level data

File: pairfam_activity_by_month.csv

This dataset contains 1,027 individuals observed monthly from ages 18 to 40 (264 months). The states are encoded as numeric codes (1–8) according to the encoding table above.

Structure

Besides the state sequences, the dataset includes several covariates:

ColumnDescription
idIndividual identifier (original pairfam IDs, e.g., 111000, 2931000)
weight40Survey weight at age 40 (design weight)
sexSex (1 = male, 0 = female)
doby_genYear of birth (generation year)
dobMonth-year of birth (numerical encoding)
ethniEthnicity indicator
migstatusMigration background status
yeducYears of education
sat1i4, sat5, sat6Selected satisfaction indicators from the survey
highschoolHigh school graduation status
churchChurch attendance indicator
biosibNumber of biological siblings
stepsibNumber of step-siblings
eastRegion indicator (East vs. West Germany)
famstructure18Family structure at age 18
1264Monthly activity trajectory states, coded 1–8 as above

Sample data

idweight40sexdoby_gendobethnimigstatusyeduchighschoolchurchbiosibeast12345
1110000.344119718551111.5001144444
29310001.767019738815310.5011011111
34910000.727119718571118.0113011111

Here columns 15 show the first five months of the trajectory, coded as 1–8 according to the state table above.

Multichannel data (reference only)

For multichannel sequence analysis combining both family and activity trajectories, see the MultiChannel.csv file documented in the Pairfam Family Trajectories page.

Note that MultiChannel.csv is not supported by load_dataset() and is provided for reference only. You can download it manually from the month-level data sources repository.

Reference

Raab, M., & Struffolino, E. (2022). Sequence analysis (Vol. 190). Sage Publications.

Author: Yuqi Liang