coffea - Columnar Object Framework For Effective Analysis
Basic tools and wrappers for enabling not-too-alien syntax when running columnar Collider HEP analysis.
coffea is a prototype package for pulling together all the typical needs of a high-energy collider physics (HEP) experiment analysis using the scientific python ecosystem. It makes use of uproot and awkward-array to provide an array-based syntax for manipulating HEP event data in an efficient and numpythonic way. There are sub-packages that implement histogramming, plotting, and look-up table functionalities that are needed to convey scientific insight, apply transformations to data, and correct for discrepancies in Monte Carlo simulations compared to data.
coffea also supplies facilities for horizontally scaling an analysis in order to reduce time-to-insight in a way that is largely independent of the resource the analysis is being executed on. By making use of modern big-data technologies like Apache Spark, parsl, Dask , and Work Queue, it is possible with coffea to scale a HEP analysis from a testing on a laptop to: a large multi-core server, computing clusters, and super-computers without the need to alter or otherwise adapt the analysis code itself.
coffea is a HEP community project collaborating with iris-hep and is currently a prototype. We welcome input to improve its quality as we progress towards a sensible refactorization into the scientific python ecosystem and a first release. Please feel free to contribute at our github repo!
Installation
Install coffea like any other Python package:
pip install coffea
or similar (use sudo
, --user
, virtualenv
, or pip-in-conda if you wish).
For more details, see the Installing coffea section of the documentation.
Strict dependencies
Python (3.6+)
The following are installed automatically when you install coffea with pip:
numpy (1.15+);
uproot for interacting with ROOT files and handling their data transparently;
awkward-array to manipulate complex-structured columnar data, such as jagged arrays;
numba just-in-time compilation of python functions;
scipy for many statistical functions;
matplotlib as a plotting backend;
and other utility packages, as enumerated in
setup.py
.
Documentation
All documentation is hosted at https://coffeateam.github.io/coffea/
- Installing coffea
- Coffea by Example
- Coffea concepts
- API Reference Guide
- coffea.analysis_tools
- coffea.btag_tools
- coffea.hist
- Functions
- Classes
- Hist
Hist
Hist.DEFAULT_DTYPE
Hist.fields
Hist.label
Hist.add()
Hist.axes()
Hist.axis()
Hist.clear()
Hist.compatible()
Hist.copy()
Hist.dense_axes()
Hist.dense_dim()
Hist.dim()
Hist.fill()
Hist.group()
Hist.identifiers()
Hist.identity()
Hist.integrate()
Hist.project()
Hist.rebin()
Hist.remove()
Hist.scale()
Hist.sparse_axes()
Hist.sparse_dim()
Hist.sparse_nbins()
Hist.sum()
Hist.to_boost()
Hist.to_hist()
Hist.values()
- Bin
- Interval
- Cat
- StringBin
- Hist
- Class Inheritance Diagram
- coffea.jetmet_tools
- coffea.lookup_tools
- coffea.lumi_tools
- coffea.nanoevents
- coffea.nanoevents.methods.base
- coffea.nanoevents.methods.candidate
- coffea.nanoevents.methods.nanoaod
- coffea.nanoevents.methods.vector
- Classes
- TwoVector
- PolarTwoVector
- ThreeVector
- SphericalThreeVector
- LorentzVector
LorentzVector
LorentzVector.boostvec
LorentzVector.energy
LorentzVector.eta
LorentzVector.mass
LorentzVector.mass2
LorentzVector.pvec
LorentzVector.absolute()
LorentzVector.add()
LorentzVector.boost()
LorentzVector.delta_r()
LorentzVector.delta_r2()
LorentzVector.metric_table()
LorentzVector.multiply()
LorentzVector.nearest()
LorentzVector.negative()
LorentzVector.subtract()
LorentzVector.sum()
- PtEtaPhiMLorentzVector
PtEtaPhiMLorentzVector
PtEtaPhiMLorentzVector.E
PtEtaPhiMLorentzVector.eta
PtEtaPhiMLorentzVector.mass
PtEtaPhiMLorentzVector.mass2
PtEtaPhiMLorentzVector.phi
PtEtaPhiMLorentzVector.pt
PtEtaPhiMLorentzVector.r
PtEtaPhiMLorentzVector.rho
PtEtaPhiMLorentzVector.rho2
PtEtaPhiMLorentzVector.t
PtEtaPhiMLorentzVector.theta
PtEtaPhiMLorentzVector.z
PtEtaPhiMLorentzVector.multiply()
PtEtaPhiMLorentzVector.negative()
- PtEtaPhiELorentzVector
PtEtaPhiELorentzVector
PtEtaPhiELorentzVector.E
PtEtaPhiELorentzVector.energy
PtEtaPhiELorentzVector.eta
PtEtaPhiELorentzVector.phi
PtEtaPhiELorentzVector.pt
PtEtaPhiELorentzVector.r
PtEtaPhiELorentzVector.rho
PtEtaPhiELorentzVector.rho2
PtEtaPhiELorentzVector.t
PtEtaPhiELorentzVector.theta
PtEtaPhiELorentzVector.z
PtEtaPhiELorentzVector.multiply()
PtEtaPhiELorentzVector.negative()
- Class Inheritance Diagram
- Classes
- coffea.processor
- Functions
- Classes
- ProcessorABC
- LazyDataFrame
- Weights
- PackedSelection
- IterativeExecutor
- FuturesExecutor
- DaskExecutor
- ParslExecutor
- WorkQueueExecutor
WorkQueueExecutor
WorkQueueExecutor.bar_format
WorkQueueExecutor.chunks_accum_in_mem
WorkQueueExecutor.chunks_per_accum
WorkQueueExecutor.chunksize
WorkQueueExecutor.compression
WorkQueueExecutor.cores
WorkQueueExecutor.custom_init
WorkQueueExecutor.debug_log
WorkQueueExecutor.disk
WorkQueueExecutor.dynamic_chunksize
WorkQueueExecutor.environment_file
WorkQueueExecutor.events_total
WorkQueueExecutor.fast_terminate_workers
WorkQueueExecutor.filepath
WorkQueueExecutor.gpus
WorkQueueExecutor.manager_name
WorkQueueExecutor.master_name
WorkQueueExecutor.memory
WorkQueueExecutor.password_file
WorkQueueExecutor.port
WorkQueueExecutor.print_stdout
WorkQueueExecutor.resource_monitor
WorkQueueExecutor.resources_mode
WorkQueueExecutor.retries
WorkQueueExecutor.split_on_exhaustion
WorkQueueExecutor.ssl
WorkQueueExecutor.stats_log
WorkQueueExecutor.status_display_interval
WorkQueueExecutor.tasks_accum_log
WorkQueueExecutor.transactions_log
WorkQueueExecutor.treereduction
WorkQueueExecutor.verbose
WorkQueueExecutor.wrapper
WorkQueueExecutor.x509_proxy
WorkQueueExecutor.__call__()
- Runner
Runner
Runner.align_clusters
Runner.cachestrategy
Runner.chunksize
Runner.dynamic_chunksize
Runner.format
Runner.maxchunks
Runner.metadata_cache
Runner.mmap
Runner.pre_executor
Runner.processor_compression
Runner.retries
Runner.savemetrics
Runner.skipbadfiles
Runner.use_dataframes
Runner.use_skyhook
Runner.xrootdtimeout
Runner.__call__()
Runner.automatic_retries()
Runner.get_cache()
Runner.metadata_fetcher()
Runner.preprocess()
Runner.read_coffea_config()
Runner.run()
- AccumulatorABC
- value_accumulator
- list_accumulator
- set_accumulator
- dict_accumulator
- defaultdict_accumulator
- column_accumulator
- NanoAODSchema
- TreeMakerSchema
- iterative_executor
- futures_executor
- dask_executor
- parsl_executor
- work_queue_executor
- Class Inheritance Diagram
- coffea.util