DaskExecutor

class coffea.processor.DaskExecutor(status: bool = True, unit: str = 'items', desc: str = 'Processing', compression: int | None = 1, function_name: str | None = None, client: dask.distributed.Client | None = None, treereduction: int = 20, priority: int = 0, retries: int = 3, heavy_input: bytes | None = None, use_dataframes: bool = False, worker_affinity: bool = False)[source]

Bases: ExecutorBase

Execute using dask futures

Parameters:
  • items (list) – List of input arguments

  • function (callable) – A function to be called on each input, which returns an accumulator instance

  • accumulator (Accumulatable) – An accumulator to collect the output of the function

  • client (distributed.client.Client) – A dask distributed client instance

  • treereduction (int, optional) – Tree reduction factor for output accumulators (default: 20)

  • status (bool, optional) – If true (default), enable progress bar

  • compression (int, optional) – Compress accumulator outputs in flight with LZ4, at level specified (default 1) Set to None for no compression.

  • priority (int, optional) – Task priority, default 0

  • retries (int, optional) – Number of retries for failed tasks (default: 3)

  • heavy_input (serializable, optional) – Any value placed here will be broadcast to workers and joined to input items in a tuple (item, heavy_input) that is passed to function.

  • function_name (str, optional) – Name of the function being passed

  • use_dataframes (bool, optional) –

    Retrieve output as a distributed Dask DataFrame (default: False). The outputs of individual tasks must be Pandas DataFrames.

    Note

    If heavy_input is set, function is assumed to be pure.

Attributes Summary

client

heavy_input

priority

retries

treereduction

use_dataframes

worker_affinity

Methods Summary

__call__(items, function, accumulator)

Call self as a function.

Attributes Documentation

client: dask.distributed.Client | None = None
heavy_input: bytes | None = None
priority: int = 0
retries: int = 3
treereduction: int = 20
use_dataframes: bool = False
worker_affinity: bool = False

Methods Documentation

__call__(items: Iterable, function: Callable, accumulator: Addable | MutableSet | MutableMapping)[source]

Call self as a function.