Installing coffea

Quick start

To try coffea now, without installing anything, you can experiment with our hosted tutorial notebooks.

Platform support

Coffea is a python package distributed via PyPI. A python installation is required to use coffea. Python version 3.6 or newer is required.

All functional features in each supported python version are routinely tested. You can see the python version you have installed by typing the following at the command prompt:

>>> python --version

or, in some cases, if both python 2 and 3 are available, you can find the python 3 version via:

>>> python3 --version

coffea core functionality is routinely tested on Windows, Linux and MacOS. All Local executors are tested against all three platforms, however the Distributed executors are not routinely tested on Windows.

Coffea starts from v0.5.0 in the PyPI repository since before v0.5.0 it was hosted as fnal-column-analysis-tools. If you are still using fnal-column-analysis-tools, please move to coffea!

Install coffea

To install coffea, there are several mostly-equivalent options:

  • install coffea system-wide using pip install coffea;

  • if you do not have administrator permissions, install as local user with pip install --user coffea;

  • if you prefer to not place coffea in your global environment, you can set up a Virtual environment;

  • if you use Conda, simply conda install coffea;

  • or, if you like to use containers, see Pre-built images below.

To update a previously installed coffea to a newer version, use: pip install --upgrade coffea Although not required, it is recommended to also install Jupyter, as it provides a more interactive development environment. The installation procedure is essentially identical as above: pip install jupyter. (If you use conda, conda install jupyter is a better option.)

In rare cases, you may find that the pip executable in your path does not correspond to the same python installation as the python executable. This is a sign of a broken python environment. However, this can be bypassed by using the syntax python -m pip ... in place of pip ....

Install optional dependencies

Coffea supports several optional components that require additional package installations. In particular, all of the Distributed executors require additional packages. The necessary dependencies can be installed easily via pip using the setuptools extras facility:

  • Apache Spark distributed executor: pip install coffea[spark]

  • parsl distributed executor: pip install coffea[parsl]

  • dask distributed executor: pip install coffea[dask]

  • Work Queue distributed executor: see Work Queue Executor for installation instructions

Multiple extras can be installed together via, e.g. pip install coffea[dask,spark]

Virtual environment

Virtual environments are a good way to isolate python environments, and ensure no hidden dependencies. You can find more information at https://docs.python.org/3/library/venv.html

python -m venv my_env
source my_env/bin/activate
pip install coffea

Pre-built images

A complete coffea + scientific python environment is available as a docker image:

docker run -it --name docker-coffea-base coffeateam/coffea-base

More information is available at https://github.com/CoffeaTeam/docker-coffea-base#readme Additionally there is an image with dask dependencies (including dask-jobqueue):

docker run -it --name docker-coffea-dask coffeateam/coffea-dask

With corresponding repo at https://github.com/CoffeaTeam/docker-coffea-dask#readme

If you use singularity, there are preconverted images available via the unpacked.cern.ch service. For example, you can start a shell with:

singularity shell -B ${PWD}:/work /cvmfs/unpacked.cern.ch/registry.hub.docker.com/coffeateam/coffea-dask:latest

Install via cvmfs

Although the local installation can work anywhere, if the base environment does not already have most of the coffea dependencies, then the user-local package directory can become quite bloated. An option to avoid this bloat is to use a base python environment provided via CERN LCG, which is available on any system that has the cvmfs directory /cvmfs/sft.cern.ch/ mounted. Simply source a LCG release (shown here: 98python3) and install:

# check your platform: CC7 shown below, for SL6 it would be "x86_64-slc6-gcc8-opt"
source /cvmfs/sft.cern.ch/lcg/views/LCG_98python3/x86_64-centos7-gcc9-opt/setup.sh  # or .csh, etc.
pip install --user coffea

This method can be fragile, since the LCG-distributed packages may conflict with the coffea dependencies. In general it is better to define your own environment or use an image.

Creating a cvmfs-based portable virtual environment

In some instances, it may be useful to have a self-contained environment that can be relocated. One use case is for users of coffea that do not have access to a distributed compute cluster that is compatible with one of the coffea distributed executors. Here, a fallback solution can be found by creating traditional batch jobs (e.g. condor) which then use coffea local executors, possibly multi-threaded. In this case, often the user-local python package directory is not available from batch workers, so a portable python enviroment needs to be created. Annoyingly, python virtual environments are not portable by default due to several hardcoded paths in specific locations, however there are not many locations and some sed hacks can save the day. Here is an example of a bash script that installs coffea on top of the LCG 98python3 software stack inside a portable virtual environment, with the caveat that cvmfs must be visible from batch workers:

#!/usr/bin/env bash
NAME=coffeaenv
LCG=/cvmfs/sft.cern.ch/lcg/views/LCG_98python3/x86_64-centos7-gcc9-opt

source $LCG/setup.sh
# following https://aarongorka.com/blog/portable-virtualenv/, an alternative is https://github.com/pantsbuild/pex
python -m venv --copies $NAME
source $NAME/bin/activate
LOCALPATH=$NAME$(python -c 'import sys; print(f"/lib/python{sys.version_info.major}.{sys.version_info.minor}/site-packages")')
export PYTHONPATH=${LOCALPATH}:$PYTHONPATH
python -m pip install setuptools pip wheel --upgrade
python -m pip install coffea
sed -i '1s/#!.*python$/#!\/usr\/bin\/env python/' $NAME/bin/*
sed -i '40s/.*/VIRTUAL_ENV="$(cd "$(dirname "$(dirname "${BASH_SOURCE[0]}" )")" \&\& pwd)"/' $NAME/bin/activate
sed -i "2a source ${LCG}/setup.sh" $NAME/bin/activate
sed -i "3a export PYTHONPATH=${LOCALPATH}:\$PYTHONPATH" $NAME/bin/activate
tar -zcf ${NAME}.tar.gz ${NAME}

The resulting tarball size is about 60 MB. An example batch job wrapper script is:

#!/usr/bin/env bash
tar -zxf coffeaenv.tar.gz
source coffeaenv/bin/activate

echo "Running command:" $@
time $@ || exit $?

Note that this environment only functions from the working directory of the wrapper script due to having relative paths. Unless you install jupyter into this environment (which may bloat the tarball–LCG98 jupyter is reasonably recent), it is not visible inside the LCG jupyter server. From a shell with the virtual environment activated, you can execute:

python -m ipykernel install --user --name=coffeaenv

to make a new kernel available that uses this environment.

For Developers

  1. Download source:

git clone https://github.com/CoffeaTeam/coffea
  1. Install with development dependencies:

cd coffea
pip install --editable .[dev]
// or if you need to work on the executors, e.g. dask,
pip install --editable .[dev,dask]
  1. Develop a cool new feature or fix some bugs

  2. Lint source, run tests, and build documentation:

flake8 coffea
black coffea
pytest tests
pushd docs && make html && popd
  1. Make a pull request!