{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reading data with coffea NanoEvents\n",
    "\n",
    "This is a rendered copy of [nanoevents.ipynb](https://github.com/CoffeaTeam/coffea/blob/master/binder/nanoevents.ipynb). You can optionally run it interactively on [binder at this link](https://mybinder.org/v2/gh/coffeateam/coffea/master?filepath=binder%2Fnanoevents.ipynb)\n",
    "\n",
    "NanoEvents is a Coffea utility to wrap flat nTuple structures (such as the CMS [NanoAOD](https://www.epj-conferences.org/articles/epjconf/pdf/2019/19/epjconf_chep2018_06021.pdf) format) into a single awkward array with appropriate object methods (such as Lorentz vector methods$^*$), cross references, and nested objects, all lazily accessed$^\\dagger$ from the source ROOT TTree via uproot. The interpretation of the TTree data is configurable via [schema objects](https://coffeateam.github.io/coffea/modules/coffea.nanoevents.html#classes), which are community-supplied  for various source file types. These schema objects allow a richer interpretation of the file contents than the [uproot.lazy](https://uproot4.readthedocs.io/en/latest/uproot4.behaviors.TBranch.lazy.html) methods. Currently available schemas include:\n",
    "\n",
    "   - `BaseSchema`, which provides a simple representation of the input TTree, where each branch is available verbatim as `events.branch_name`, effectively the same behavior as `uproot.lazy`.  Any branches that uproot supports at \"full speed\" (i.e. that are fully split and either flat or single-jagged) can be read by this schema;\n",
    "   - `NanoAODSchema`, which is optimized to provide all methods and cross-references in CMS NanoAOD format;\n",
    "   - `PFNanoAODSchema`, which builds a double-jagged particle flow candidate colllection `events.jet.constituents` from compatible PFNanoAOD input files;\n",
    "   - `TreeMakerSchema` which is designed to read TTrees made by [TreeMaker](https://github.com/TreeMaker/TreeMaker), an alternative CMS nTuplization format;\n",
    "   - `PHYSLITESchema`, for the ATLAS DAOD_PHYSLITE derivation, a compact centrally-produced data format similar to CMS NanoAOD; and\n",
    "   - `DelphesSchema`, for reading Delphes fast simulation [nTuples](https://cp3.irmp.ucl.ac.be/projects/delphes/wiki/WorkBook/RootTreeDescription).\n",
    "\n",
    "We welcome contributions for new schemas, and can assist with the design of them.\n",
    "\n",
    "$^*$ Vector methods are currently made possible via the [coffea vector](https://coffeateam.github.io/coffea/modules/coffea.nanoevents.methods.vector.html) methods mixin class structure. In a future version of coffea, they will instead be provided by the dedicated scikit-hep [vector](https://vector.readthedocs.io/en/latest/) library, which provides a more rich feature set. The coffea vector methods predate the release of the vector library.\n",
    "\n",
    "$^\\dagger$ _Lazy_ access refers to only fetching the needed data from the (possibly remote) file when a sub-array is first accessed. The sub-array is then _materialized_ and subsequent access of the sub-array uses a cached value in memory. As such, fully materializing a `NanoEvents` object may require a significant amount of memory.\n",
    "\n",
    "\n",
    "In this demo, we will use NanoEvents to read a small CMS NanoAOD sample. The events object can be instantiated as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import awkward as ak\n",
    "from coffea.nanoevents import NanoEventsFactory, NanoAODSchema\n",
    "\n",
    "NanoAODSchema.warn_missing_crossrefs = False\n",
    "\n",
    "fname = \"https://raw.githubusercontent.com/CoffeaTeam/coffea/master/tests/samples/nano_dy.root\"\n",
    "events = NanoEventsFactory.from_root(\n",
    "    {fname: \"Events\"},\n",
    "    schemaclass=NanoAODSchema,\n",
    "    metadata={\"dataset\": \"DYJets\"},\n",
    ").events()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the factory constructor, we also pass the desired schema version (the latest version of NanoAOD can be built with `schemaclass=NanoAODSchema`) for this file and some extra metadata that we can later access with `events.metadata`. In a later example, we will show how to set up this metadata in coffea processors where the `events` object is pre-created for you. Consider looking at the [from_root](https://coffeateam.github.io/coffea/api/coffea.nanoevents.NanoEventsFactory.html#coffea.nanoevents.NanoEventsFactory.from_root) class method to see all optional arguments.\n",
    "\n",
    "The `events` object is an awkward array, which at its top level is a record array with one record for each \"collection\", where a collection is a grouping of fields (TBranches) based on the naming conventions of [NanoAODSchema](https://coffeateam.github.io/coffea/api/coffea.nanoevents.NanoAODSchema.html). For example, in the file we opened, the branches:\n",
    "```\n",
    "Generator_binvar\n",
    "Generator_scalePDF\n",
    "Generator_weight\n",
    "Generator_x1\n",
    "Generator_x2\n",
    "Generator_xpdf1\n",
    "Generator_xpdf2\n",
    "Generator_id1\n",
    "Generator_id2\n",
    "```\n",
    "are grouped into one sub-record named `Generator` which can be accessed using either getitem or getattr syntax, i.e. `events[\"Generator\"]` or `events.Generator`. e.g."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[1,\n",
       " -1,\n",
       " -1,\n",
       " 21,\n",
       " 21,\n",
       " 4,\n",
       " 2,\n",
       " -2,\n",
       " 2,\n",
       " 1,\n",
       " ...,\n",
       " 1,\n",
       " -2,\n",
       " 2,\n",
       " 1,\n",
       " 2,\n",
       " -2,\n",
       " -1,\n",
       " 2,\n",
       " 1]\n",
       "--------------------------------------------------------------\n",
       "type: 40 * int32[parameters={&quot;__doc__&quot;: &quot;id of first parton&quot;}]</pre>"
      ],
      "text/plain": [
       "<Array [1, -1, -1, 21, 21, ..., -2, -1, 2, 1] type='40 * int32[parameters={...'>"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events.Generator.id1.compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['binvar', 'scalePDF', 'weight', 'x1', 'x2', 'xpdf1', 'xpdf2', 'id1', 'id2']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# all names can be listed with:\n",
    "events.Generator.fields"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In CMS NanoAOD, each TBranch has a self-documenting help string embedded in the title field, which is carried into the NanoEvents, e.g. executing the following cell should produce a help pop-up:\n",
    "```\n",
    "Type:            Array\n",
    "String form:     [1, -1, -1, 21, 21, 4, 2, -2, 2, 1, 3, 1, ... -1, -1, 1, -2, 2, 1, 2, -2, -1, 2, 1]\n",
    "Length:          40\n",
    "File:            ~/src/awkward-1.0/awkward1/highlevel.py\n",
    "Docstring:       id of first parton\n",
    "Class docstring: ...\n",
    "```\n",
    "where the `Docstring` shows information about the content of this array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\u001b[0;31mType:\u001b[0m            Array\n",
       "\u001b[0;31mString form:\u001b[0m     dask.awkward<id1, npartitions=1>\n",
       "\u001b[0;31mFile:\u001b[0m            /opt/homebrew/lib/python3.11/site-packages/dask_awkward/lib/core.py\n",
       "\u001b[0;31mDocstring:\u001b[0m       id of first parton\n",
       "\u001b[0;31mClass docstring:\u001b[0m\n",
       "Partitioned, lazy, and parallel Awkward Array Dask collection.\n",
       "\n",
       "The class constructor is not intended for users. Instead use\n",
       "factory functions like :py:func:`~dask_awkward.from_parquet`,\n",
       ":py:func:`~dask_awkward.from_json`, etc.\n",
       "\n",
       "Within dask-awkward the ``new_array_object`` factory function is\n",
       "used for creating new instances.\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "events.Generator.id1?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Based on a collection's name or contents, some collections acquire additional _methods_, which are extra features exposed by the code in the mixin classes of the `coffea.nanoevents.methods` modules. For example, although `events.GenJet` has the fields:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['eta', 'mass', 'phi', 'pt', 'partonFlavour', 'hadronFlavour']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events.GenJet.fields"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "we can access additional attributes associated to each generated jet by virtue of the fact that they can be interpreted as [Lorentz vectors](https://coffeateam.github.io/coffea/api/coffea.nanoevents.methods.vector.LorentzVector.html#coffea.nanoevents.methods.vector.LorentzVector):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[217, 670, 258],\n",
       " [34.5, 98.3, 1.16e+03, 38.1, 20.4, 29.7],\n",
       " [306, 62.8, 74.1, 769, 11.2],\n",
       " [170, 117, 29.3, 45.9],\n",
       " [101, 117, 129, 15.6],\n",
       " [63.1, 37.2, 33.7, 36.2],\n",
       " [303, 50.5, 1.29e+03, 278],\n",
       " [615, 282, 2.11e+03],\n",
       " [195, 47.6],\n",
       " [95, 44.6, 223, 318, 30, 108, 62.9],\n",
       " ...,\n",
       " [41.6, 36.7, 78.9, 13],\n",
       " [1.51e+03, 1.23e+03],\n",
       " [152, 160, 777, 27.1, 346, 65.1, 37.9, 27.2, 16.3],\n",
       " [35.4, 20.4],\n",
       " [20.1, 16.2],\n",
       " [34],\n",
       " [553, 283],\n",
       " [771, 452, 16],\n",
       " [76.9]]\n",
       "----------------------------------------------------\n",
       "type: 40 * var * float32</pre>"
      ],
      "text/plain": [
       "<Array [[217, 670, 258], [34.5, ...], ..., [76.9]] type='40 * var * float32'>"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events.GenJet.energy.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can call more complex methods, like computing the distance $\\Delta R = \\sqrt{\\Delta \\eta^2 + \\Delta \\phi ^2}$ between two LorentzVector objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[],\n",
       " [3.13],\n",
       " [3.45, 2.18],\n",
       " [1.58, 3.76],\n",
       " [],\n",
       " [0.053],\n",
       " [0.0748],\n",
       " [],\n",
       " [],\n",
       " [1.82],\n",
       " ...,\n",
       " [0.00115],\n",
       " [],\n",
       " [0.0149],\n",
       " [],\n",
       " [0.0308],\n",
       " [],\n",
       " [0.0858],\n",
       " [],\n",
       " []]\n",
       "------------------------\n",
       "type: 40 * var * float32</pre>"
      ],
      "text/plain": [
       "<Array [[], [3.13], [3.45, 2.18], ..., [], []] type='40 * var * float32'>"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# find distance between leading jet and all electrons in each event\n",
    "dr = events.Jet[:, 0].delta_r(events.Electron)\n",
    "dr.compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[None,\n",
       " 3.13,\n",
       " 2.18,\n",
       " 1.58,\n",
       " None,\n",
       " 0.053,\n",
       " 0.0748,\n",
       " None,\n",
       " None,\n",
       " 1.82,\n",
       " ...,\n",
       " 0.00115,\n",
       " None,\n",
       " 0.0149,\n",
       " None,\n",
       " 0.0308,\n",
       " None,\n",
       " 0.0858,\n",
       " None,\n",
       " None]\n",
       "-------------------\n",
       "type: 40 * ?float32</pre>"
      ],
      "text/plain": [
       "<Array [None, 3.13, 2.18, 1.58, ..., 0.0858, None, None] type='40 * ?float32'>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# find minimum distance\n",
    "ak.min(dr, axis=1).compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[None, None, None, None, None], [Electron, Electron, ..., Electron], ..., [None, None]]\n"
     ]
    }
   ],
   "source": [
    "# a convenience method for this operation on all jets is available\n",
    "print(events.Jet.nearest(events.Electron).compute())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The assignment of methods classes to collections is done inside the schema object during the initial creation of the array, governed by the awkward array's `__record__` parameter and the associated behavior. See [ak.behavior](https://awkward-array.readthedocs.io/en/latest/ak.behavior.html) for a more detailed explanation of array behaviors.\n",
    "\n",
    "Additional methods provide convenience functions for interpreting some branches, e.g. CMS NanoAOD packs several jet identification flag bits into a single integer, `jetId`. By implementing the bit-twiddling in the [Jet mixin](https://github.com/CoffeaTeam/coffea/blob/7045c06b9448d2be4315e65d432e6d8bd117d6d7/coffea/nanoevents/methods/nanoaod.py#L279-L282), the analsyis code becomes more clear:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[6, 6, 6, 6, 6], [6, 2, 6, 6, 6, 6, 6, 0], ..., [6, 6, 0, ..., 6, 6], [6, 6]]\n",
      "[[True, True, True, True, True], [True, True, ..., False], ..., [True, True]]\n"
     ]
    }
   ],
   "source": [
    "print(events.Jet.jetId.compute())\n",
    "print(events.Jet.isTight.compute())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also define convenience functions to unpack and apply some mask to a set of flags, e.g. for generated particles:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Raw status flags: [[10625, 27009, 4481, 22913, 257, ..., 13884, 13884, 13884, 12876, 12876], ...]\n"
     ]
    }
   ],
   "source": [
    "print(f\"Raw status flags: {events.GenPart.statusFlags.compute()}\")\n",
    "events.GenPart.hasFlags(['isPrompt', 'isLastCopy']);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "CMS NanoAOD also contains pre-computed cross-references for some types of collections. For example, there is a TBranch `Electron_genPartIdx` which indexes the `GenPart` collection per event to give the matched generated particle, and `-1` if no match is found. NanoEvents transforms these indices into an awkward _indexed array_ pointing to the collection, so that one can directly access the matched particle using getattr syntax:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[],\n",
       " [-11],\n",
       " [-11, 11],\n",
       " [22, None],\n",
       " [],\n",
       " [None],\n",
       " [None],\n",
       " [],\n",
       " [],\n",
       " [11],\n",
       " ...,\n",
       " [11],\n",
       " [],\n",
       " [11],\n",
       " [],\n",
       " [-11],\n",
       " [],\n",
       " [None],\n",
       " [],\n",
       " []]\n",
       "---------------------------------------------------------\n",
       "type: 40 * var * ?int32[parameters={&quot;__doc__&quot;: &quot;PDG id&quot;}]</pre>"
      ],
      "text/plain": [
       "<Array [[], [-11], [-11, 11], ..., [], []] type='40 * var * ?int32[paramete...'>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events.Electron.matched_gen.pdgId.compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[84.4, 29.4],\n",
       " [31.1],\n",
       " [53.4, 81.9],\n",
       " [29.2],\n",
       " [17.5],\n",
       " [65.9, 47.8],\n",
       " [58.5, 44.7],\n",
       " [50.2, 45.2],\n",
       " [33.3, 25.9],\n",
       " [None],\n",
       " [26.1],\n",
       " [25.8]]\n",
       "-------------------------------------------------------\n",
       "type: 12 * var * ?float32[parameters={&quot;__doc__&quot;: &quot;pt&quot;}]</pre>"
      ],
      "text/plain": [
       "<Array [[84.4, 29.4], [31.1], ..., [25.8]] type='12 * var * ?float32[parame...'>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events.Muon[ak.num(events.Muon)>0].matched_jet.pt.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For generated particles, the parent index is similarly mapped:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[None, None, 1, 1, 23, 23, 23, 23, ..., 15, -15, -15, -15, -15, -15, 111, 111],\n",
       " [None, None, -1, 23, 23, 23, 23, ..., -11, None, None, None, None, None, 433],\n",
       " [None, None, -1, -1, 23, 23, 23, 23, ..., -423, -1, -1, -421, -421, 111, 111],\n",
       " [None, None, 21, 21, 23, -1, 23, 23, ..., -15, -15, -15, -15, -15, 111, 111],\n",
       " [None, None, 21, 21, 23, 23, 23, 23, ..., 13, 13, -13, 1, None, None, 2, 2],\n",
       " [None, None, 4, 23, 23, 23, 23, 23, ..., -15, -15, -15, 15, 15, 15, 423, 311],\n",
       " [None, None, 2, 2, 2, 23, 23, 2, 23, ..., -13, 13, 2, 2, 2, 2, 111, 111, 111],\n",
       " [None, None, -2, -2, 23, 21, 21, 23, 23, ..., 21, 21, 21, 21, None, 423, 2, 2],\n",
       " [None, None, 2, 23, 23, 23, 23, 23, ..., -15, -15, -15, -15, 111, 111, 311],\n",
       " [None, None, 1, 1, 1, 23, 21, 23, 23, 23, ..., -411, 21, 21, 1, 1, 1, 1, 3, 3],\n",
       " ...,\n",
       " [None, None, 1, 23, 23, 23, 23, 23, ..., -15, -15, -15, -15, 1, 1, 111, 111],\n",
       " [None, None, -2, 23, 23, 23, 23, 23, 13, 13, -13, -13, -13, 13],\n",
       " [None, None, 2, 2, 2, 23, 2, 23, ..., None, -413, 413, 413, 2, 2, -421, -421],\n",
       " [None, None, 1, 23, 23, 23, 23, 23, ..., -15, 15, 15, 15, -15, -15, -15, -15],\n",
       " [None, None, 2, 2, 23, 23, 21, 23, ..., 15, 15, 15, -15, -15, -15, 111, 111],\n",
       " [None, None, -2, 23, 23, None, 23, 23, ..., -15, -15, -15, 423, 4, 4, 3, 3],\n",
       " [None, None, -1, 23, 23, 23],\n",
       " [None, None, 2, 23, 23, 23, 23, -11, -11, 11],\n",
       " [None, None, 1, 1, 23, 23, 23, 23, ..., -15, -15, -15, 111, 111, 111, 111]]\n",
       "--------------------------------------------------------------------------------\n",
       "type: 40 * var * ?int32[parameters={&quot;__doc__&quot;: &quot;PDG id&quot;}]</pre>"
      ],
      "text/plain": [
       "<Array [[None, None, 1, ..., 111, 111], ...] type='40 * var * ?int32[parame...'>"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events.GenPart.parent.pdgId.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition, using the parent index, a helper method computes the inverse mapping, namely, `children`. As such, one can find particle siblings with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[None, None, [23, 21], ..., [-16, 111, ..., 211, -211], [22, 22], [22, 22]],\n",
       " [None, None, [23], [23], [23], [23], ..., None, None, None, None, None, [431]],\n",
       " [None, None, [23, -1], [23, -1], [23], ..., [13, -14], [13, -14], [22], [22]],\n",
       " [None, None, [23, -1], ..., [-16, 111, ..., 211, -211], [22, 22], [22, 22]],\n",
       " [None, None, [23, 1], [23, 1], [23], ..., None, None, [11, -11], [11, -11]],\n",
       " [None, None, [23], [23], ..., [16, 13, -14], [16, 13, -14], [421], [310]],\n",
       " [None, None, [23, 2, 2], [23, 2, 2], ..., [...], [22], [11, -11], [11, -11]],\n",
       " [None, None, [23, 21], [23, 21], [23], ..., None, [421], [11, -11], [11, -11]],\n",
       " [None, None, [23], [23], ..., [-16, 111, ..., 311], [22, 22], [22, 22], [310]],\n",
       " [None, None, [23, 21, 21], [23, ...], ..., [11, -11], [11, -11], [11, -11]],\n",
       " ...,\n",
       " [None, None, [23], [23], [23], ..., [11, -11], [11, -11], [22, 22], [22, 22]],\n",
       " [None, None, [23], [23], [23], ..., [...], [-13], [-13, 22], [-13, 22], [13]],\n",
       " [None, None, [23, 2, 21], ..., [2, 21, ..., 11, -11], [13, -14], [13, -14]],\n",
       " [None, None, [23], ..., [...], [-16, 211, 211, -211], [-16, 211, 211, -211]],\n",
       " [None, None, [23, 21], [23, 21], ..., [-16, -11, 12], [22, 22], [22, 22]],\n",
       " [None, None, [23], [23], ..., [423, -421, 11, -11], [11, -11], [11, -11]],\n",
       " [None, None, [23], [23], [-13, 13], [-13, 13]],\n",
       " [None, None, [23], [23], [23], ..., [-11, 11], [-11, 22], [-11, 22], [11]],\n",
       " [None, None, [23, 21], [23, 21], ..., [22, ...], [22, 22], [22, 22], [22, 22]]]\n",
       "--------------------------------------------------------------------------------\n",
       "type: 40 * var * option[var * ?int32[parameters={&quot;__doc__&quot;: &quot;PDG id&quot;}]]</pre>"
      ],
      "text/plain": [
       "<Array [[None, None, ..., [22, 22]], ...] type='40 * var * option[var * ?in...'>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events.GenPart.parent.children.pdgId.compute()\n",
    "# notice this is a doubly-jagged array"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since often one wants to shortcut repeated particles in a decay sequence, a helper method `distinctParent` is also available. Here we use it to find the parent particle ID for all prompt electrons:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[],\n",
       " [23, 23],\n",
       " [23, 23],\n",
       " [],\n",
       " [],\n",
       " [],\n",
       " [],\n",
       " [23, 23],\n",
       " [],\n",
       " [23, 23],\n",
       " ...,\n",
       " [],\n",
       " [],\n",
       " [23, 23],\n",
       " [],\n",
       " [],\n",
       " [],\n",
       " [],\n",
       " [23, 23],\n",
       " []]\n",
       "---------------------------------------------------------\n",
       "type: 40 * var * ?int32[parameters={&quot;__doc__&quot;: &quot;PDG id&quot;}]</pre>"
      ],
      "text/plain": [
       "<Array [[], [23, 23], [...], ..., [23, 23], []] type='40 * var * ?int32[par...'>"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events.GenPart[\n",
    "    (abs(events.GenPart.pdgId) == 11)\n",
    "    & events.GenPart.hasFlags(['isPrompt', 'isLastCopy'])\n",
    "].distinctParent.pdgId.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Events can be filtered like any other awkward array using boolean fancy-indexing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[94.6,\n",
       " 87.6,\n",
       " 88,\n",
       " 90.4,\n",
       " 89.1,\n",
       " 31.6]\n",
       "-----------------\n",
       "type: 6 * float32</pre>"
      ],
      "text/plain": [
       "<Array [94.6, 87.6, 88, 90.4, 89.1, 31.6] type='6 * float32'>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mmevents = events[ak.num(events.Muon) == 2]\n",
    "zmm = mmevents.Muon[:, 0] + mmevents.Muon[:, 1]\n",
    "zmm.mass.compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[94.6,\n",
       " 87.6,\n",
       " 88,\n",
       " 90.4,\n",
       " 89.1,\n",
       " 31.6]\n",
       "-----------------\n",
       "type: 6 * float32</pre>"
      ],
      "text/plain": [
       "<Array [94.6, 87.6, 88, 90.4, 89.1, 31.6] type='6 * float32'>"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# a convenience method is available to sum vectors along an axis:\n",
    "mmevents.Muon.sum(axis=1).mass.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected for this sample, most of the dimuon events have a pair invariant mass close to that of a Z boson. But what about the last event? Let's take a look at the generator information:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[-13, 13]\n",
      "[False, False]\n"
     ]
    }
   ],
   "source": [
    "print(mmevents[-1].Muon.matched_gen.pdgId.compute())\n",
    "print(mmevents[-1].Muon.matched_gen.hasFlags([\"isPrompt\"]).compute())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So they are real generated muons, but they are not prompt (i.e. from the initial decay of a heavy resonance)\n",
    "\n",
    "Let's look at their parent particles:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[-15,\n",
       " 15]\n",
       "--------------------------------------------------\n",
       "type: 2 * ?int32[parameters={&quot;__doc__&quot;: &quot;PDG id&quot;}]</pre>"
      ],
      "text/plain": [
       "<Array [-15, 15] type='2 * ?int32[parameters={\"__doc__\": \"PDG id\"}]'>"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mmevents[-1].Muon.matched_gen.parent.pdgId.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "aha! They are muons coming from tau lepton decays, and hence a fair amount of the Z mass is carried away by the neutrinos:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "31.265245\n",
      "91.68365\n"
     ]
    }
   ],
   "source": [
    "print(mmevents[-1].Muon.matched_gen.sum().mass.compute())\n",
    "print(mmevents[-1].Muon.matched_gen.parent.sum().mass.compute())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One can assign new variables to the arrays, with some caveats:\n",
    "\n",
    " * Assignment must use setitem (`events[\"path\", \"to\", \"name\"] = value`)\n",
    " * Assignment to a sliced `events` won't be accessible from the original variable\n",
    " * New variables are not visible from cross-references"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre>[[],\n",
       " [121],\n",
       " [],\n",
       " [],\n",
       " [],\n",
       " []]\n",
       "-----------------------\n",
       "type: 6 * var * float32</pre>"
      ],
      "text/plain": [
       "<Array [[], [121], [], [], [], []] type='6 * var * float32'>"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mmevents[\"Electron\", \"myvariable\"] = mmevents.Electron.pt + zmm.mass\n",
    "mmevents.Electron.myvariable.compute()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}