Skip to content

soft_classification_variables()

soft_classification_variables() prepares a FANNY membership matrix for regression by dropping one reference cluster column. The remaining K − 1 columns are continuous membership-degree predictors (Helske et al. 2024 soft classification).

Function Usage

python
soft_classification_variables(
    U,
    *,
    reference=0,
    ids=None,
    as_dataframe=False,
    cluster_names=None,
)

R / Literature Parameter Mapping

SequenzoR / packagesNotes
Ucluster::fanny membership matrixFrom fanny_membership()
Omitted reference columnBaseline cluster in regressionHelske Table 1: one membership column omitted
Predictor type"Membership degree", continuousNot dummies

Entry Parameters

ParameterRequiredTypeDescription
UndarrayMembership matrix of shape (n, K). Rows must be nonnegative and sum to 1.
referenceint0-based index of the reference cluster to omit. Default 0.
idslist / Index / NoneRow index when as_dataframe=True.
as_dataframeboolIf True, return a DataFrame; otherwise a NumPy array.
cluster_nameslist / NoneOptional length-K names; reference name is omitted. Default columns: P_1, P_2, …

What It Returns

np.ndarray of shape (n, K − 1) or pd.DataFrame when as_dataframe=True.

Each retained column is the membership degree for one non-reference cluster.

Example

python
from sequenzo import fanny_membership, soft_classification_variables

U, _ = fanny_membership(diss, k=5, m=1.4)
X_soft = soft_classification_variables(
    U,
    reference=0,
    as_dataframe=True,
    ids=seqdata.ids,
    cluster_names=["stable", "unstable", "inactive", "late_entry", "mixed"],
)

print(X_soft.shape)

R Counterpart

  • Closest R workflow: use fanny membership columns directly in regression after dropping a reference column.
  • Mapping note: No WeightedCluster wrapper; Sequenzo validates row sums and builds omitted-reference predictors explicitly.

Notes

  • Rows of U must sum to 1 within floating-point tolerance.
  • The omitted reference column is not ignored conceptually; it is represented implicitly because all membership columns sum to 1. One column is dropped only to avoid perfect collinearity in regression.
  • Coefficients describe how the outcome changes with membership in each non-reference cluster relative to the omitted reference.
  • Soft membership still forces probabilities to sum to 1 across clusters, so it may not separate mixed cases from poor cluster fit as clearly as representativeness.

Authors

Code: Yuqi Liang

Documentation: Yuqi Liang

References

Helske, S., Helske, J., & Chihaya, G. K. (2024). From sequences to variables: Rethinking the relationship between sequences and outcomes. Sociological Methodology, 54(1), 27–51.