PackedSelection

class coffea.analysis_tools.PackedSelection(dtype='uint32')[source]

Bases: object

Store several boolean arrays in a compact manner

This class can store several boolean arrays in a memory-efficient mannner and evaluate arbitrary combinations of boolean requirements in an CPU-efficient way. Supported inputs are 1D numpy or awkward arrays.

Parameters:

dtype (numpy.dtype or str) – internal bitwidth of the packed array, which governs the maximum number of selections storable in this object. The default value is uint32, which allows up to 32 booleans to be stored, but if a smaller or larger number of selections needs to be stored, one can choose uint16 or uint64 instead.

Attributes Summary

delayed_mode

maxitems

names

Current list of mask names available

Methods Summary

add(name, selection[, fill_value])

Add a new boolean array

add_multiple(selections[, fill_value])

Add multiple boolean arrays at once, see add for details

all(*names)

Shorthand for require, where all the values are True.

allfalse(*names)

Shorthand for require, where all the values are False.

any(*names)

Return a mask vector corresponding to an inclusive OR of requirements

cutflow(*names)

Compute the cutflow for a set of selections

nminusone(*names)

Compute the "N-1" style selection for a set of selections

require(**names)

Return a mask vector corresponding to specific requirements

Attributes Documentation

delayed_mode
maxitems
names

Current list of mask names available

Methods Documentation

add(name, selection, fill_value=False)[source]

Add a new boolean array

Parameters:
  • name (str) – name of the selection

  • selection (numpy.ndarray or awkward.Array) – a flat array of type bool or ?bool. If this is not the first selection added, it must also have the same shape as previously added selections. If the array is option-type, null entries will be filled with fill_value.

  • fill_value (bool, optional) – All masked entries will be filled as specified (default: False)

add_multiple(selections, fill_value=False)[source]

Add multiple boolean arrays at once, see add for details

Parameters:
  • selections (dict) – a dictionary of selections, in the form {name: selection}

  • fill_value (bool, optional) – All masked entries will be filled as specified (default: False)

all(*names)[source]

Shorthand for require, where all the values are True. If no arguments are given, all the added selections are required to be True.

allfalse(*names)[source]

Shorthand for require, where all the values are False. If no arguments are given, all the added selections are required to be False.

any(*names)[source]

Return a mask vector corresponding to an inclusive OR of requirements

Parameters:

*names (args) – The named selections to allow

Examples

If

>>> selection.names
['cut1', 'cut2', 'cut3']

then

>>> selection.any("cut1", "cut2")
array([True, False, True, ...])

returns a boolean array where an entry is True if the corresponding entries cut1 == True or cut2 == False, and cut3 arbitrary.

cutflow(*names)[source]

Compute the cutflow for a set of selections

Returns an object which can return a list of the number of events that pass all the previous selections including the current one after each named selection is applied consecutively. The first element of the returned list is the total number of events before any selections are applied. The last element is the final number of events that pass after all the selections are applied. Can also return a cutflow histogram as a hist.Hist object where the bin heights are the number of events of the cutflow list. If the PackedSelection is in delayed mode, the elements of the list will be dask_awkward Arrays that can be computed whenever the user wants. If the histogram is requested, those delayed arrays will be computed in the process in order to set the bin heights.

Parameters:

*names (args) – The named selections to use, need to be a subset of the selections already added

Returns:

res – A wrapper class for the results, see the documentation for that class for more details

Return type:

coffea.analysis_tools.Cutflow

nminusone(*names)[source]

Compute the “N-1” style selection for a set of selections

The N-1 style selection for a set of selections, returns an object which can return a list of the number of events that pass all the other selections ignoring one at a time. The first element of the returned list is the total number of events before any selections are applied. The last element is the final number of events that pass if all selections are applied. It also returns a list of boolean mask vectors of which events pass the N-1 selection each time. Can also return a histogram as a hist.Hist object where the bin heights are the number of events of the N-1 selection list. If the PackedSelection is in delayed mode, the elements of those lists will be dask_awkward Arrays that can be computed whenever the user wants. If the histogram is requested, the delayed arrays of the number of events list will be computed in the process in order to set the bin heights.

Parameters:

*names (args) – The named selections to use, need to be a subset of the selections already added

Returns:

res – A wrapper class for the results, see the documentation for that class for more details

Return type:

coffea.analysis_tools.NminusOne

require(**names)

Return a mask vector corresponding to specific requirements

Specify an exact requirement on an arbitrary subset of the masks

Parameters:

**names (kwargs) – Each argument to require specific value for, in form arg=True or arg=False.

Examples

If

>>> selection.names
['cut1', 'cut2', 'cut3']

then

>>> selection.require(cut1=True, cut2=False)
array([True, False, True, ...])

returns a boolean array where an entry is True if the corresponding entries cut1 == True, cut2 == False, and cut3 arbitrary.