bff.normalization_pd

bff.normalization_pd(df, scaler=None, columns=None, suffix=None, new_type=<class 'numpy.float32'>, **kwargs)

Normalize columns of a pandas DataFrame using the given scaler.

If the columns are not provided, will normalize all the numerical columns.

If the original columns are integers (RangeIndex), it is not possible to replace them. This will create new columns having the same integer, but as a string name.

By default, if the suffix is not provided, columns are overridden.

Parameters
  • df (pd.DataFrame) – DataFrame to normalize.

  • scaler (TransformerMixin, default MinMaxScaler) – Scaler of sklearn to use for the normalization.

  • columns (sequence of str, default None) – Columns to normalize. If None, normalize all numerical columns.

  • suffix (str, default None) – If provided, create the normalization in new columns having this suffix.

  • new_type (np.dtype, default np.float32) – New type for the columns.

  • **kwargs – Additional keyword arguments to be passed to the scaler function from sklearn.

Returns

DataFrame with the normalized columns.

Return type

pd.DataFrame

Examples

>>> import numpy as np
>>> import pandas as pd
>>> from sklearn.preprocessing import StandardScaler
>>> data = {'x': [123, 27, 38, 45, 67], 'y': [456, 45.4, 32, 34, 90]}
>>> df = pd.DataFrame(data)
>>> df
     x      y
0  123  456.0
1   27   45.4
2   38   32.0
3   45   34.0
4   67   90.0
>>> df_std = df.pipe(normalization_pd, columns=['x'], scaler=StandardScaler)
>>> df_std
          x      y
0  1.847198  456.0
1 -0.967580   45.4
2 -0.645053   32.0
3 -0.439809   34.0
4  0.205244   90.0
>>> df_min_max = normalization_pd(df, suffix='_norm', feature_range=(0, 2),
...                               new_type=np.float64)
>>> df_min_max
     x      y    x_norm    y_norm
0  123  456.0  2.000000  2.000000
1   27   45.4  0.000000  0.063208
2   38   32.0  0.229167  0.000000
3   45   34.0  0.375000  0.009434
4   67   90.0  0.833333  0.273585