bff.normalization_pd¶
-
bff.
normalization_pd
(df, scaler=None, columns=None, suffix=None, new_type=<class 'numpy.float32'>, **kwargs)¶ Normalize columns of a pandas DataFrame using the given scaler.
If the columns are not provided, will normalize all the numerical columns.
If the original columns are integers (RangeIndex), it is not possible to replace them. This will create new columns having the same integer, but as a string name.
By default, if the suffix is not provided, columns are overridden.
- Parameters
df (pd.DataFrame) – DataFrame to normalize.
scaler (TransformerMixin, default MinMaxScaler) – Scaler of sklearn to use for the normalization.
columns (sequence of str, default None) – Columns to normalize. If None, normalize all numerical columns.
suffix (str, default None) – If provided, create the normalization in new columns having this suffix.
new_type (np.dtype, default np.float32) – New type for the columns.
**kwargs – Additional keyword arguments to be passed to the scaler function from sklearn.
- Returns
DataFrame with the normalized columns.
- Return type
pd.DataFrame
Examples
>>> import numpy as np >>> import pandas as pd >>> from sklearn.preprocessing import StandardScaler >>> data = {'x': [123, 27, 38, 45, 67], 'y': [456, 45.4, 32, 34, 90]} >>> df = pd.DataFrame(data) >>> df x y 0 123 456.0 1 27 45.4 2 38 32.0 3 45 34.0 4 67 90.0 >>> df_std = df.pipe(normalization_pd, columns=['x'], scaler=StandardScaler) >>> df_std x y 0 1.847198 456.0 1 -0.967580 45.4 2 -0.645053 32.0 3 -0.439809 34.0 4 0.205244 90.0 >>> df_min_max = normalization_pd(df, suffix='_norm', feature_range=(0, 2), ... new_type=np.float64) >>> df_min_max x y x_norm y_norm 0 123 456.0 2.000000 2.000000 1 27 45.4 0.000000 0.063208 2 38 32.0 0.229167 0.000000 3 45 34.0 0.375000 0.009434 4 67 90.0 0.833333 0.273585