simulate_mhmm()
simulate_mhmm() draws sequences from a Mixture HMM. Each sequence is first assigned to a cluster, then generated from that cluster's HMM parameters.
Function Usage
python
simulate_mhmm(
n_sequences,
n_clusters,
initial_probs,
transition_probs,
emission_probs,
cluster_probs=None,
sequence_length=None,
alphabet=None,
state_names=None,
cluster_names=None,
formula=None,
data=None,
coefficients=None,
random_state=None
)seqHMM Parameter Mapping
| Sequenzo | seqHMM simulate_mhmm() |
|---|---|
Per-cluster initial_probs, transition_probs, emission_probs | Lists of cluster-specific matrices |
cluster_probs | Fixed mixture weights |
formula, data, coefficients | Covariate-dependent cluster probabilities (multinomial logit) |
Entry Parameters
| Parameter | Required | Type | Description |
|---|---|---|---|
n_sequences | ✓ | int | Number of sequences. |
n_clusters | ✓ | int | Number of mixture components. |
initial_probs | ✓ | List[ndarray] | One initial distribution per cluster. |
transition_probs | ✓ | List[ndarray] | One transition matrix per cluster. |
emission_probs | ✓ | List[ndarray] | One emission matrix per cluster. |
cluster_probs | ✗* | ndarray / None | Fixed weights (n_clusters,). |
sequence_length | ✓ | int | Length of each sequence. |
alphabet | ✗ | List[str] / None | Observed symbols. |
state_names | ✗ | List[List[str]] / None | Hidden state names per cluster. |
cluster_names | ✗ | List[str] / None | Cluster labels. |
formula | ✗* | str / None | Formula for covariate-dependent mixture, e.g. "~ x1 + x2". |
data | ✗* | DataFrame / None | Covariate data (one row per sequence). |
coefficients | ✗* | ndarray / None | Multinomial logit coefficients for cluster assignment. |
random_state | ✗ | int / None | RNG seed. |
*Provide either cluster_probs or (formula + data + coefficients).
What It Returns
A dict with observations, states, clusters (cluster index per sequence), and observations_df.
Example
Fixed mixture weights
python
import numpy as np
from sequenzo.seqhmm import simulate_mhmm
sim = simulate_mhmm(
n_sequences=20,
n_clusters=2,
initial_probs=[np.array([0.5, 0.5]), np.array([0.3, 0.7])],
transition_probs=[
np.array([[0.7, 0.3], [0.3, 0.7]]),
np.array([[0.8, 0.2], [0.2, 0.8]]),
],
emission_probs=[
np.array([[0.9, 0.1], [0.1, 0.9]]),
np.array([[0.7, 0.3], [0.3, 0.7]]),
],
cluster_probs=np.array([0.6, 0.4]),
sequence_length=20,
alphabet=["A", "B"],
random_state=42,
)Covariate-dependent clusters
python
import pandas as pd
data = pd.DataFrame({
"covariate_1": np.random.rand(30),
"covariate_2": np.random.choice(["A", "B"], size=30),
})
coefs = np.array([
[0, -1.5],
[0, 3.0],
[0, -0.7],
])
sim = simulate_mhmm(
n_sequences=30,
n_clusters=2,
initial_probs=[...],
transition_probs=[...],
emission_probs=[...],
sequence_length=20,
formula="~ covariate_1 + covariate_2",
data=data,
coefficients=coefs,
alphabet=["A", "B"],
random_state=42,
)R Counterpart
- Closest R function: seqHMM
simulate_mhmm()
Notes
- Covariate-based mixture simulation is available here even though
build_mhmm()does not yet estimate covariate-dependent weights. - First cluster is the reference level in the coefficient matrix (first column typically zero).
Authors
Code: Yuqi Liang
Documentation: Yuqi Liang
References
Helske, S., & Helske, J. (2019). Mixture hidden Markov models for sequence data: The seqHMM package in R. Journal of Statistical Software, 88(3), 1–32.