Installing coffea
Quick start
To try coffea now, without installing anything, you can experiment with our hosted tutorial notebooks.
Platform support
Coffea is a python package distributed via PyPI. A python installation is required to use coffea. Python version 3.6 or newer is required.
All functional features in each supported python version are routinely tested. You can see the python version you have installed by typing the following at the command prompt:
>>> python --version
or, in some cases, if both python 2 and 3 are available, you can find the python 3 version via:
>>> python3 --version
coffea core functionality is routinely tested on Windows, Linux and MacOS. All Local executors are tested against all three platforms, however the Distributed executors are not routinely tested on Windows.
Coffea starts from v0.5.0 in the PyPI repository since before v0.5.0 it was hosted as fnal-column-analysis-tools. If you are still using fnal-column-analysis-tools, please move to coffea!
Install coffea
To install coffea, there are several mostly-equivalent options:
install coffea system-wide using
pip install coffea
;if you do not have administrator permissions, install as local user with
pip install --user coffea
;if you prefer to not place coffea in your global environment, you can set up a Virtual environment;
if you use Conda, simply
conda install coffea
;or, if you like to use containers, see Pre-built images below.
To update a previously installed coffea to a newer version, use: pip install --upgrade coffea
Although not required, it is recommended to also install Jupyter, as it provides a more interactive development environment.
The installation procedure is essentially identical as above: pip install jupyter
. (If you use conda, conda install jupyter
is a better option.)
In rare cases, you may find that the pip
executable in your path does not correspond to the same python installation as the python
executable. This is a sign of a broken python environment. However, this can be bypassed by using the syntax python -m pip ...
in place of pip ...
.
Install optional dependencies
Coffea supports several optional components that require additional package installations.
In particular, all of the Distributed executors require additional packages.
The necessary dependencies can be installed easily via pip
using the setuptools extras facility:
Apache Spark distributed executor:
pip install coffea[spark]
parsl distributed executor:
pip install coffea[parsl]
dask distributed executor:
pip install coffea[dask]
Work Queue distributed executor: see Work Queue Executor for installation instructions
Multiple extras can be installed together via, e.g. pip install coffea[dask,spark]
Virtual environment
Virtual environments are a good way to isolate python environments, and ensure no hidden dependencies. You can find more information at https://docs.python.org/3/library/venv.html
python -m venv my_env
source my_env/bin/activate
pip install coffea
Pre-built images
A complete coffea + scientific python environment is available as a docker image:
docker run -it --name docker-coffea-base coffeateam/coffea-base
More information is available at https://github.com/CoffeaTeam/docker-coffea-base#readme Additionally there is an image with dask dependencies (including dask-jobqueue):
docker run -it --name docker-coffea-dask coffeateam/coffea-dask
With corresponding repo at https://github.com/CoffeaTeam/docker-coffea-dask#readme
If you use singularity, there are preconverted images available via the unpacked.cern.ch service. For example, you can start a shell with:
singularity shell -B ${PWD}:/work /cvmfs/unpacked.cern.ch/registry.hub.docker.com/coffeateam/coffea-dask:latest
Install via cvmfs
Although the local installation can work anywhere, if the base environment does not already have most of the coffea dependencies, then the user-local package directory can become quite bloated.
An option to avoid this bloat is to use a base python environment provided via CERN LCG, which is available on any system that has the cvmfs directory /cvmfs/sft.cern.ch/
mounted.
Simply source a LCG release (shown here: 98python3) and install:
# check your platform: CC7 shown below, for SL6 it would be "x86_64-slc6-gcc8-opt"
source /cvmfs/sft.cern.ch/lcg/views/LCG_98python3/x86_64-centos7-gcc9-opt/setup.sh # or .csh, etc.
pip install --user coffea
This method can be fragile, since the LCG-distributed packages may conflict with the coffea dependencies. In general it is better to define your own environment or use an image.
Creating a portable virtual environment
In some instances, it may be useful to have a self-contained environment that can be relocated. One use case is for users of coffea that do not have access to a distributed compute cluster that is compatible with one of the coffea distributed executors. Here, a fallback solution can be found by creating traditional batch jobs (e.g. condor) which then use coffea local executors, possibly multi-threaded. In this case, often the user-local python package directory is not available from batch workers, so a portable python enviroment needs to be created. Annoyingly, python virtual environments are not portable by default due to several hardcoded paths in specific locations, however there are two workarounds presented below. In both cases, we make a virtual environment that starts from a non-system base python environment to lower the amount of needed installations in the virtual environment. One can always start a venv from scratch, but the number of coffea dependencies makes the installation rather large, up to a few hundred MB.
Container-based
If we start from one of the singularity containers from the Pre-built images section, we don’t have to install nearly as much
software in our virtual environment, letting the container image take care of the majority of the codebase. For example, the following
code starts from the coffea-dask
image and adds a special python module that is not included in the base image:
singularity shell -B ${PWD}:/srv /cvmfs/unpacked.cern.ch/registry.hub.docker.com/coffeateam/coffea-dask:latest
cd /srv
python -m venv --without-pip --system-site-packages myenv
source myenv/bin/activate
python -m pip install --ignore-installed h5py
This creates a virtual environmennt myenv
and a directory with the same name where the extra python module h5py
will be
installed. At this point, the terminal prompt will look like (myenv) Singularity>
, indicating you are inside a singularity
image and have myenv
activated. Next time you log in, only lines 1, 2, and 4 need to be re-executed.
If using HTCondor for job submission, you can create a tarball of the virtual environment directory and then submit condor
jobs using the +SingularityImage
HTCondor option.
Note that this option is not enabled by default in HTCondor installations, so you may need to talk to your site administrator to be
able to use this option. You will also need to create a small wrapper script to re-source the environment to have the job use the
same environment as your interactive container.
A complete example that runs at FNAL LPC is shown in this gist.
LCG-based
There are not many locations to edit to make a venv portable, and some sed hacks can save the day. Here is an example of a bash script that installs coffea on top of the LCG 98python3 software stack inside a portable virtual environment, with the caveat that cvmfs must be visible from batch workers:
#!/usr/bin/env bash
NAME=coffeaenv
LCG=/cvmfs/sft.cern.ch/lcg/views/LCG_98python3/x86_64-centos7-gcc9-opt
source $LCG/setup.sh
# following https://aarongorka.com/blog/portable-virtualenv/, an alternative is https://github.com/pantsbuild/pex
python -m venv --copies $NAME
source $NAME/bin/activate
LOCALPATH=$NAME$(python -c 'import sys; print(f"/lib/python{sys.version_info.major}.{sys.version_info.minor}/site-packages")')
export PYTHONPATH=${LOCALPATH}:$PYTHONPATH
python -m pip install setuptools pip wheel --upgrade
python -m pip install coffea
sed -i '1s/#!.*python$/#!\/usr\/bin\/env python/' $NAME/bin/*
sed -i '40s/.*/VIRTUAL_ENV="$(cd "$(dirname "$(dirname "${BASH_SOURCE[0]}" )")" \&\& pwd)"/' $NAME/bin/activate
sed -i "2a source ${LCG}/setup.sh" $NAME/bin/activate
sed -i "3a export PYTHONPATH=${LOCALPATH}:\$PYTHONPATH" $NAME/bin/activate
tar -zcf ${NAME}.tar.gz ${NAME}
The resulting tarball size is about 60 MB. An example batch job wrapper script is:
#!/usr/bin/env bash
tar -zxf coffeaenv.tar.gz
source coffeaenv/bin/activate
echo "Running command:" $@
time $@ || exit $?
Note that this environment only functions from the working directory of the wrapper script due to having relative paths. Unless you install jupyter into this environment (which may bloat the tarball–LCG98 jupyter is reasonably recent), it is not visible inside the LCG jupyter server. From a shell with the virtual environment activated, you can execute:
python -m ipykernel install --user --name=coffeaenv
to make a new kernel available that uses this environment.
For Developers
Download source:
git clone https://github.com/CoffeaTeam/coffea
Install with development dependencies:
cd coffea pip install --editable .[dev] // or if you need to work on the executors, e.g. dask, pip install --editable .[dev,dask]
Develop a cool new feature or fix some bugs
Lint source, run tests, and build documentation:
flake8 coffea tests setup.py black coffea tests setup.py pytest tests pushd docs && make html && popd
Make a pull request!