bff.cast_to_category_pd

bff.cast_to_category_pd(df, deep=True)

Automatically converts columns of pandas DataFrame that are worth stored as category dtype.

To be casted a column must not be numerical, must be hashable and must have less than 50% of unique values.

Parameters
  • df (pd.DataFrame) – DataFrame with the columns to cast.

  • deep (bool, default True) – Whether to perform a deep copy of the original DataFrame.

Returns

Optimized copy of the input DataFrame.

Return type

pd.DataFrame

Examples

>>> import pandas as pd
>>> columns = ['name', 'age', 'country']
>>> df = pd.DataFrame([['John', 24, 'China'],
...                    ['Mary', 20, 'China'],
...                    ['Jane', 25, 'Switzerland'],
...                    ['Greg', 23, 'China'],
...                    ['James', 28, 'China']],
...                   columns=columns)
>>> df
    name  age      country
0   John   24        China
1   Jane   25  Switzerland
2  James   28        China
>>> df.dtypes
name       object
age         int64
country    object
dtype: object
>>> df_optimized = cast_to_category_pd(df)
>>> df_optimized.dtypes
name       object
age         int64
country  category
dtype: object