Skip to content

replace_cluster_id_by_labels()

replace_cluster_id_by_labels() is a helper function that lets you replace numeric cluster IDs in a DataFrame with custom labels and optionally rename the identifier and cluster columns.

This is especially useful after running clustering, when you want to give clusters meaningful names (e.g., "Group A", "Group B") instead of numeric IDs.

Function Usage

python
new_df = replace_cluster_id_by_labels(
    df,
    mapping={1: "A", 2: "B", 3: "C"},  # optional
    new_cluster_column_name="Cluster",  # optional
    new_id_column_name="Entity ID"      # optional
)

Entry Parameters

ParameterRequiredTypeDescription
dfDataFrameInput DataFrame containing at least "Entity ID" and "Cluster" columns.
mappingdictDictionary mapping cluster IDs (keys) to labels (values). Unmapped clusters remain unchanged.
new_cluster_column_namestrNew name for the cluster column. Default = "Cluster".
new_id_column_namestrNew name for the entity ID column. Default = "Entity ID".

What It Does

  • Checks that your DataFrame contains both "Entity ID" and "Cluster" columns.

  • Validates that all keys in mapping exist in the cluster IDs of your DataFrame.

  • Replaces cluster IDs with custom labels based on the provided mapping.

    • Unmapped clusters remain as their original numeric values.
  • Renames the identifier and cluster columns if you provide new names.

  • Returns the updated DataFrame.

Key Features

  • Custom labels: turn numeric cluster IDs into interpretable names.
  • Validation: warns you if a mapping key does not exist in your DataFrame.
  • Flexible renaming: lets you rename "Entity ID" and "Cluster" columns to fit your workflow.
  • Safe handling: unmapped clusters stay untouched instead of breaking the DataFrame.

Examples

1. Replace cluster IDs with labels

python
import pandas as pd

original_df = pd.DataFrame({
    "Entity ID": [1, 2, 3],
    "Cluster": [1, 2, 3]
})

mapping = {1: "A", 2: "B", 3: "C"}

new_df = replace_cluster_id_by_labels(original_df, mapping)
print(new_df)

Output:

   Entity ID Cluster
0          1       A
1          2       B
2          3       C

2. Replace IDs and rename columns

python
new_df = replace_cluster_id_by_labels(original_df,
                                      mapping={1: "A", 2: "B", 3: "C"},
                                      new_cluster_column_name="Group",
                                      new_id_column_name="Person")
print(new_df)

Output:

   Person Group
0       1     A
1       2     B
2       3     C

Authors

Code: Yuqi Liang

Documentation: Yuqi Liang

Edited by: Yuqi Liang

Released under the BSD-3-Clause License.