Skip to content

simulate_mhmm()

simulate_mhmm() draws sequences from a Mixture HMM. Each sequence is first assigned to a cluster, then generated from that cluster's HMM parameters.

Function Usage

python
simulate_mhmm(
    n_sequences,
    n_clusters,
    initial_probs,
    transition_probs,
    emission_probs,
    cluster_probs=None,
    sequence_length=None,
    alphabet=None,
    state_names=None,
    cluster_names=None,
    formula=None,
    data=None,
    coefficients=None,
    random_state=None
)

seqHMM Parameter Mapping

SequenzoseqHMM simulate_mhmm()
Per-cluster initial_probs, transition_probs, emission_probsLists of cluster-specific matrices
cluster_probsFixed mixture weights
formula, data, coefficientsCovariate-dependent cluster probabilities (multinomial logit)

Entry Parameters

ParameterRequiredTypeDescription
n_sequencesintNumber of sequences.
n_clustersintNumber of mixture components.
initial_probsList[ndarray]One initial distribution per cluster.
transition_probsList[ndarray]One transition matrix per cluster.
emission_probsList[ndarray]One emission matrix per cluster.
cluster_probs✗*ndarray / NoneFixed weights (n_clusters,).
sequence_lengthintLength of each sequence.
alphabetList[str] / NoneObserved symbols.
state_namesList[List[str]] / NoneHidden state names per cluster.
cluster_namesList[str] / NoneCluster labels.
formula✗*str / NoneFormula for covariate-dependent mixture, e.g. "~ x1 + x2".
data✗*DataFrame / NoneCovariate data (one row per sequence).
coefficients✗*ndarray / NoneMultinomial logit coefficients for cluster assignment.
random_stateint / NoneRNG seed.

*Provide either cluster_probs or (formula + data + coefficients).

What It Returns

A dict with observations, states, clusters (cluster index per sequence), and observations_df.

Example

Fixed mixture weights

python
import numpy as np
from sequenzo.seqhmm import simulate_mhmm

sim = simulate_mhmm(
    n_sequences=20,
    n_clusters=2,
    initial_probs=[np.array([0.5, 0.5]), np.array([0.3, 0.7])],
    transition_probs=[
        np.array([[0.7, 0.3], [0.3, 0.7]]),
        np.array([[0.8, 0.2], [0.2, 0.8]]),
    ],
    emission_probs=[
        np.array([[0.9, 0.1], [0.1, 0.9]]),
        np.array([[0.7, 0.3], [0.3, 0.7]]),
    ],
    cluster_probs=np.array([0.6, 0.4]),
    sequence_length=20,
    alphabet=["A", "B"],
    random_state=42,
)

Covariate-dependent clusters

python
import pandas as pd

data = pd.DataFrame({
    "covariate_1": np.random.rand(30),
    "covariate_2": np.random.choice(["A", "B"], size=30),
})

coefs = np.array([
    [0, -1.5],
    [0, 3.0],
    [0, -0.7],
])

sim = simulate_mhmm(
    n_sequences=30,
    n_clusters=2,
    initial_probs=[...],
    transition_probs=[...],
    emission_probs=[...],
    sequence_length=20,
    formula="~ covariate_1 + covariate_2",
    data=data,
    coefficients=coefs,
    alphabet=["A", "B"],
    random_state=42,
)

R Counterpart

  • Closest R function: seqHMM simulate_mhmm()

Notes

  • Covariate-based mixture simulation is available here even though build_mhmm() does not yet estimate covariate-dependent weights.
  • First cluster is the reference level in the coefficient matrix (first column typically zero).

Authors

Code: Yuqi Liang

Documentation: Yuqi Liang

References

Helske, S., & Helske, J. (2019). Mixture hidden Markov models for sequence data: The seqHMM package in R. Journal of Statistical Software, 88(3), 1–32.