Skip to content

simulate_nhmm()

simulate_nhmm() generates sequences from a Non-homogeneous HMM specified with formulas and coefficient matrices. Probabilities can vary over time and covariates.

Function Usage

python
simulate_nhmm(
    n_states,
    emission_formula,
    data,
    id_var,
    time_var,
    initial_formula=None,
    transition_formula=None,
    coefs=None,
    init_sd=None,
    random_state=None
)

seqHMM Parameter Mapping

SequenzoseqHMM simulate_nhmm()
n_statesNumber of hidden states
emission_formula, initial_formula, transition_formulaFormula specification per parameter block
data, id_var, time_varLong-format covariate and response scaffold
coefsDictionary of coefficient matrices (initial_probs, transition_probs, emission_probs)
init_sdSD for random coefficient draws when coefs is None

Entry Parameters

ParameterRequiredTypeDescription
n_statesintHidden states (> 1).
emission_formulastrFormula for emissions, e.g. "~ x1 + x2".
dataDataFrameMust include response columns (values replaced during simulation), IDs, and covariates.
id_varstrSequence ID column.
time_varstrTime index column.
initial_formulastr / NoneInitial probabilities. Default intercept-only "~ 1".
transition_formulastr / NoneTransition probabilities. Default "~ 1".
coefsdict / NoneKnown coefficients; random if omitted.
init_sdfloat / NoneSD for random coefs. Default 2.0 when coefs is None.
random_stateint / NoneRNG seed.

What It Returns

A dict with simulated observations, states, and a long-format data frame.

Example

python
import pandas as pd
import numpy as np
from sequenzo.seqhmm import simulate_nhmm

# Long-format scaffold: one row per person × time
rows = []
for sid in range(5):
    for t in range(1, 11):
        rows.append({"id": sid, "time": t, "y": "A", "x1": t, "x2": sid % 2})
data = pd.DataFrame(rows)

sim = simulate_nhmm(
    n_states=3,
    emission_formula="~ x1 + x2",
    data=data,
    id_var="id",
    time_var="time",
    random_state=42,
)

print(sim["observations"][:2])

R Counterpart

  • Closest R function: seqHMM simulate_nhmm()

Notes

  • data defines sequence structure (IDs, times, alphabet from response columns); observed values in response columns are overwritten.
  • Pair with build_nhmm() + fit_nhmm() to test recovery of known parameters.

Authors

Code: Yuqi Liang

Documentation: Yuqi Liang

References

Helske, S., & Helske, J. (2019). Mixture hidden Markov models for sequence data: The seqHMM package in R. Journal of Statistical Software, 88(3), 1–32.