Skip to content

build_nhmm()

build_nhmm() creates an unfitted Non-homogeneous HMM (NHMM). Transition and/or emission probabilities can depend on covariates or time through coefficient matrices and a softmax link.

Function Usage

python
build_nhmm(
    observations,
    n_states,
    X=None,
    emission_formula=None,
    initial_formula=None,
    transition_formula=None,
    data=None,
    id_var=None,
    time_var=None,
    eta_pi=None,
    eta_A=None,
    eta_B=None,
    state_names=None,
    random_state=None
)

seqHMM Parameter Mapping

SequenzoseqHMM build_nhmm()
observationsstslist
n_statesNumber of hidden states
XCovariate array (manual specification)
emission_formula, initial_formula, transition_formulaFormula terms for each parameter block
data, id_var, time_varCovariate data frame and index columns
eta_pi, eta_A, eta_BCoefficient matrices for initial, transition, emission

Entry Parameters

ParameterRequiredTypeDescription
observationsSequenceDataObserved sequences.
n_statesintNumber of hidden states (> 1).
X✗*ndarray / NoneCovariate tensor (n_sequences, n_timepoints, n_covariates).
emission_formula✗*str / Formula / NoneFormula for emission probabilities, e.g. "~ age + education".
initial_formulastr / Formula / NoneFormula for initial state probabilities.
transition_formulastr / Formula / NoneFormula for transition probabilities.
data✗*DataFrame / NoneCovariate table (required with formulas if X is omitted).
id_var✗*str / NoneSequence ID column in data.
time_var✗*str / NoneTime index column in data.
eta_pi, eta_A, eta_Bndarray / NoneOptional starting coefficients.
state_namesList[str] / NoneHidden state labels.
random_stateint / NoneSeed for random coefficient initialization.

*Provide either X or (data + id_var + time_var + at least one formula).

What It Returns

An NHMM object with covariate matrix X, coefficient slots eta_pi, eta_A, eta_B, and log_likelihood=None until fitting.

Example

Method 1: Direct covariate tensor

python
import numpy as np
from sequenzo.seqhmm import build_nhmm, fit_nhmm

n_sequences = len(seq.sequences)
n_timepoints = max(len(s) for s in seq.sequences)

X = np.zeros((n_sequences, n_timepoints, 1))
for i in range(n_sequences):
    for t in range(len(seq.sequences[i])):
        X[i, t, 0] = t  # time covariate

nhmm = build_nhmm(seq, n_states=4, X=X, random_state=42)
nhmm = fit_nhmm(nhmm, verbose=True)

Method 2: Formula interface

python
import pandas as pd

covariate_df = pd.DataFrame({
    "id": range(len(seq.sequences)),
    "time": [...],  # long-format rows per id × time
    "age": [...],
    "education": [...],
})

nhmm = build_nhmm(
    seq,
    n_states=4,
    emission_formula="~ age + education",
    data=covariate_df,
    id_var="id",
    time_var="time",
    random_state=42,
)

R Counterpart

  • Closest R function: seqHMM build_nhmm()
  • Mapping note: Formula syntax follows R-style additive terms; interactions and lags are not yet supported in Sequenzo.

Notes

  • Follow with fit_nhmm().
  • Multichannel NHMM is not yet supported.
  • Supported formulas are additive (e.g. "~ x1 + x2"); no *, lag(), or transforms yet.

Authors

Code: Yuqi Liang

Documentation: Yuqi Liang

References

Helske, S., & Helske, J. (2019). Mixture hidden Markov models for sequence data: The seqHMM package in R. Journal of Statistical Software, 88(3), 1–32.