bff.pipe_multiprocessing_pd

bff.pipe_multiprocessing_pd(df, func, *, nb_proc=None, **kwargs)

Compute function on DataFrame with nb_proc processes.

The given function must return a new DataFrame. Rows must be independant and not depend from a value generated using the whole DataFrame.

The function uses as many processes as cpu available on the machine.

The DataFrame is splitted in nb_proc processes and then each splitted DataFrame is computed by a different process. The results are then concatenated an returned.

Parameters
  • df (pd.DataFrame) – DataFrame that must be computed by the function.

  • func (function) – Function that takes the DataFrame as input.

  • nb_proc (Union[int, None], default None) – Number of processor to use. If not provided, uses multiprocessing.cpu_count() number of processes.

  • **kwargs – Additional keyword arguments to be passed to func.

Returns

Return the DataFrame computed by func.

Return type

pd.DataFrame