Markov Chain Models
Hidden Markov models (HMMs) treat observed life-course sequences as emissions from unobserved latent states. Instead of comparing whole trajectories with a distance matrix, you specify a generative model: how likely each hidden state is at the start, how likely transitions between hidden states are, and how each hidden state produces observed states.
These pages document sequenzo.seqhmm, Sequenzo's Python implementation inspired by the R seqHMM package (Helske & Helske, 2019). The API follows seqHMM's workflow—build, fit, predict, visualize—while using Python conventions and SequenceData as the main input type.
What You Need Before You Start
Most pages assume that you already have:
A
SequenceDataobject with one row per case and one column per time point (or a list ofSequenceDataobjects for multichannel HMM).A clear research question about latent dynamics: recurring hidden regimes, mixture clusters with different dynamics, or covariate-dependent transition/emission probabilities.
If you are new to HMMs, start with the Conceptual Guides:
Model Types in This Module
| Model | Main build function | Typical question |
|---|---|---|
| Basic HMM | build_hmm() | What latent regimes generate the observed sequences? |
| Mixture HMM (MHMM) | build_mhmm() | Are there distinct subgroups, each with its own HMM? |
| Non-homogeneous HMM (NHMM) | build_nhmm() | Do transition or emission probabilities depend on covariates or time? |
All fitted models share the same high-level workflow:
- Build the model structure (
build_hmm,build_mhmm, orbuild_nhmm). - Fit parameters with EM or numerical optimization (
fit_model,fit_mhmm,fit_nhmm, orfit_model_advanced). - Predict latent states or cluster membership (
predict,predict_mhmm,posterior_probs,posterior_probs_mhmm). - Evaluate with AIC/BIC (
aic,bic,compare_models) and optional bootstrap (bootstrap_model). - Visualize estimated parameters (
plot_hmm,plot_mhmm).
A Beginner-Friendly Workflow (Basic HMM)
- Prepare sequence data. Build
SequenceDatawith the correct state alphabet and time columns. - Choose the number of hidden states. Start with a small range (for example 3–6) and compare models with BIC.
- Build and fit. Call
build_hmm()thenfit_model(). Setrandom_statefor reproducible initialization. - Inspect fit quality. Check
model.log_likelihood,model.converged, andmodel.n_iter. - Decode latent paths. Use
predict()for the Viterbi path orposterior_probs()for state probabilities at each time point. - Plot parameters. Use
plot_hmm(model, which='network')for a seqHMM-style graph, orwhich='all'for matrix views.
How This Differs from Distance-Based Analysis
Distance-based tools (clustering, discrepancy analysis, group comparison with LRT/BIC on distances) summarize how different observed sequences are. HMMs instead estimate a generative mechanism: latent states, transitions, and emissions.
Use HMMs when you want interpretable latent dynamics or mixture clusters defined by Markov structure. Use distance-based methods when your substantive question is about overall trajectory dissimilarity without a latent-state story.
Included Pages
- Conceptual Guides — Markov chain, HMM, and MHMM in plain language
- Sequenzo–seqHMM Mapping — correspondence with the R seqHMM package
- Basic HMM:
build_hmm(),fit_model(),predict(),posterior_probs(),plot_hmm() - Mixture HMM:
build_mhmm(),fit_mhmm(),predict_mhmm(),posterior_probs_mhmm(),plot_mhmm() - Non-homogeneous HMM:
build_nhmm(),fit_nhmm() - Model comparison:
aic(),bic(),compare_models() - Simulation:
simulate_hmm(),simulate_mhmm(),simulate_nhmm() - Advanced tools:
bootstrap_model(),fit_model_advanced()
Known Limitations (vs. R seqHMM)
build_mhmm()currently supports single-channel data only (no multichannel list input; no covariate formula on the mixture weights during estimation).- NHMM formula support is additive only (no interactions, lags, or transforms yet).
- Multichannel EM is implemented in pure Python and can be slow on large samples (>500 sequences).
- Some seqHMM utilities (for example
hidden_paths,stacked_sequence_plot,trim_model) are not yet exported in Sequenzo.
Authors
Code: Yuqi Liang, Yapeng Wei
Documentation: Yuqi Liang
References
Helske, S., & Helske, J. (2019). Mixture hidden Markov models for sequence data: The seqHMM package in R. Journal of Statistical Software, 88(3), 1–32. https://doi.org/10.18637/jss.v088.i03