DaskExecutor
- class coffea.processor.DaskExecutor(status: bool = True, unit: str = 'items', desc: str = 'Processing', compression: int | None = 1, function_name: str | None = None, client: dask.distributed.Client | None = None, treereduction: int = 20, priority: int = 0, retries: int = 3, heavy_input: bytes | None = None, use_dataframes: bool = False, worker_affinity: bool = False)[source]
Bases:
ExecutorBase
Execute using dask futures
- Parameters:
items (list) – List of input arguments
function (callable) – A function to be called on each input, which returns an accumulator instance
accumulator (Accumulatable) – An accumulator to collect the output of the function
client (distributed.client.Client) – A dask distributed client instance
treereduction (int, optional) – Tree reduction factor for output accumulators (default: 20)
status (bool, optional) – If true (default), enable progress bar
compression (int, optional) – Compress accumulator outputs in flight with LZ4, at level specified (default 1) Set to
None
for no compression.priority (int, optional) – Task priority, default 0
retries (int, optional) – Number of retries for failed tasks (default: 3)
heavy_input (serializable, optional) – Any value placed here will be broadcast to workers and joined to input items in a tuple (item, heavy_input) that is passed to function.
function_name (str, optional) – Name of the function being passed
use_dataframes (bool, optional) –
Retrieve output as a distributed Dask DataFrame (default: False). The outputs of individual tasks must be Pandas DataFrames.
Note
If
heavy_input
is set,function
is assumed to be pure.
Attributes Summary
Methods Summary
__call__
(items, function, accumulator)Call self as a function.
Attributes Documentation
Methods Documentation
- __call__(items: Iterable, function: Callable, accumulator: Addable | MutableSet | MutableMapping)[source]
Call self as a function.