AccumulatorABC

class coffea.processor.AccumulatorABC[source]

Bases: object

Abstract base class for an accumulator

Accumulators are abstract objects that enable the reduce stage of the typical map-reduce scaleout that we do in Coffea. One concrete example is a histogram. The idea is that an accumulator definition holds enough information to be able to create an empty accumulator (the identity() method) and add two compatible accumulators together (the add() method). The former is not strictly necessary, but helps with book-keeping. Here we show an example usage of a few accumulator types. An arbitrary-depth nesting of dictionary accumulators is supported, much like the behavior of directories in ROOT hadd.

After defining an accumulator:

from coffea.processor import dict_accumulator, column_accumulator, defaultdict_accumulator
from coffea.hist import Hist, Bin
import numpy as np

adef = dict_accumulator({
    'cutflow': defaultdict_accumulator(int),
    'pt': Hist("counts", Bin("pt", "$p_T$", 100, 0, 100)),
    'final_pt': column_accumulator(np.zeros(shape=(0,))),
})

Notice that this function does not mutate adef:

def fill(n):
    ptvals = np.random.exponential(scale=30, size=n)
    cut = ptvals > 200.
    acc = adef.identity()
    acc['cutflow']['pt>200'] += cut.sum()
    acc['pt'].fill(pt=ptvals)
    acc['final_pt'] += column_accumulator(ptvals[cut])
    return acc

As such, we can execute it several times in parallel and reduce the result:

import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
    outputs = executor.map(fill, [2000, 2000])

combined = sum(outputs, adef.identity())

Derived classes must implement

identity(): returns a new object of same type as self, such that self + self.identity() == self
add(other): adds an object of same type as self to self

Concrete implementations are then provided for __add__, __radd__, and __iadd__.

Methods Summary

`add`(other)	Add another accumulator to this one in-place
`identity`()	Identity of the accumulator

Methods Documentation

abstract add(other)[source]: Add another accumulator to this one in-place

abstract identity()[source]

Identity of the accumulator

A value such that any other value added to it will return the other value