Skip to content

get_kob_decomposition_bootstrap()

get_kob_decomposition_bootstrap() wraps get_kob_decomposition() with bootstrap resampling to obtain standard errors and percentile confidence intervals for the total gap and its components.

By default, resampling is stratified by group so each bootstrap draw keeps the original group sizes.

Function Usage

python
get_kob_decomposition_bootstrap(
    y,
    group,
    X,
    variable_names=None,
    term_ids=None,
    reference="group0",
    majority_owner=None,
    coefficient_owner_by_column=None,
    group0_value=None,
    group1_value=None,
    normalize_categorical=False,
    categorical_terms=None,
    category_ids=None,
    n_categories_by_term=None,
    owner_by_category_by_term=None,
    drop_missing=False,
    n_boot=500,
    random_state=None,
    confidence_level=0.95,
    stratified=True,
)

Entry Parameters

All parameters from get_kob_decomposition() apply, plus:

ParameterRequiredTypeDescription
n_bootintNumber of bootstrap draws. Must be ≥ 2. Default: 500.
random_stateintSeed for numpy.random.Generator.
confidence_levelfloatCI coverage in (0, 1). Default: 0.95.
stratifiedboolResample within each group separately. Default: True.

See the KOB page for y, group, X, and reference parameters.

What It Returns

A KOBBootstrapResult:

FieldTypeDescription
point_estimateKOBDecompositionResultDecomposition on the full sample
standard_errorsdictSE for total_gap, explained, unexplained_returns, unexplained_intercept
confidence_intervalsdictPercentile CIs for the same scalars
by_column_standard_errorspd.DataFrameSE per column for explained and returns
by_column_confidence_intervalspd.DataFrameCI per column
by_term_standard_errorspd.DataFrameSE per term
by_term_confidence_intervalspd.DataFrameCI per term
n_bootintDraws used
confidence_levelfloatCI level

Examples

Step 1: Point decomposition (optional preview)

python
from sequenzo.decomposition import get_kob_decomposition

point = get_kob_decomposition(
    y=y,
    group=group,
    X=X,
    group0_value="men",
    group1_value="women",
)
print(point.explained)

Step 2: Bootstrap

python
from sequenzo.decomposition import get_kob_decomposition_bootstrap

boot = get_kob_decomposition_bootstrap(
    y=y,
    group=group,
    X=X,
    group0_value="men",
    group1_value="women",
    n_boot=500,
    random_state=42,
    confidence_level=0.95,
    stratified=True,
)

Step 3: Report uncertainty

python
se = boot.standard_errors["explained"]
lo, hi = boot.confidence_intervals["explained"]
print(f"Explained: {boot.point_estimate.explained:.4f} [{lo:.4f}, {hi:.4f}], SE={se:.4f}")

print(boot.by_column_confidence_intervals.head())

Notes

  • Stratified bootstrap requires both groups to be non-empty.
  • With stratified=False, draws are simple row resamples of the full data.
  • Bootstrap recomputes the full decomposition on each draw, including categorical normalization settings.
  • For sequence-cluster typologies, prefer get_sa_kob_decomposition_bootstrap().

Authors

Code: Yuqi Liang

Documentation: Yuqi Liang

References

Jann, B. (2008). The Blinder–Oaxaca decomposition for linear regression models. The Stata Journal, 8(4), 453–479.

Efron, B., & Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.