ntuples generation with DaVinci and in-house offline components. Please refer to project wiki for more details about installation, usage, and data sources of this project.
Type in a terminal
git clone [email protected]:umd-lhcb/lhcb-ntuples-gen
cd lhcb-ntuples-gen
git remote add julian [email protected]:lhcb-ntuples-gen
git remote add glacier [email protected]:lhcb-ntuples-gen
git annex init --version=7
git submodule update --init # Do this before git annex sync to avoid potential mess-up of submodule pointers!
git annex sync
nix develop ## Can take an hour
make install-dep
make install-dep-pip ## To install packages needed for JpsiK reweighting, including zfit
Development of the DaVinci scripts can be done locally in your laptop by running our docker
image of DaVinci. Install docker
as described in the
wiki and pull the image with
docker pull umdlhcb/lhcb-stack-cc7:DaVinci-v45r6-SL
For instance, to test the standard data script you would first pull the example .dst
files,
would then enter docker
, and run the script
git annex get run2-rdx/data/data-2016-md/00102837*
make docker-dv
cd run2-rdx
./run.sh conds/cond-std-2016.py
After your script does what you want, you are ready to send ganga jobs to the LHCb grid as detailed in the wiki.
The step-1 ntuples coming out of DaVinci are processed with the babymaker, a neat script that allows for easy branch renaming and deleting, as well as cut selection and calculation of new branches. This is configured in YAML files.
For instance, the tracker-only MC ntuples used to produce the fit templates use postprocess/rdx-run2/rdx-run2_oldcut.yml
.
These ntuples are currently produced by first downloading the step-1 ntuples from the annex
. Since these are
over 1 TB, this is typically done in glacier
inside a tmux
tmux
git annex get ntuples/0.9.6-2016_production/Dst_D0-mc-tracker_only
The generation of the step-2 babies can be quite slow, currently taking about two days to run, mainly because of the normalization (and likely becaue HAMMER FF weights are recalculated each time--TODO to avoid this, these ought to be cached by saving them to the subfolders in ntuples/0.9.6-2016_production/Dst_D0-mc-tracker_only
). The ntupling is run with the following (specific options can be found inside workflows/rdx.py
):
tmux
cd workflows
## Takes 7:11, output is 299GB
./rdx.py Dst_D0-mc-tracker_only-sig_norm 2>&1 | tee step2-ntuple_mc-to-sig-norm.log
## Takes 0:09, output is 2.9GB
./rdx.py Dst_D0-mc-tracker_only-D_s 2>&1 | tee step2-ntuple_mc-to-d_s.log
## Takes 1:13, output is 1.3GB
./rdx.py Dst_D0-mc-fullsim-Lb 2>&1 | tee step2-ntuple_mc-fullsim-lb.log
## Takes 1:05, output is 31GB
./rdx.py Dst_D0-mc-tracker_only-Dstst_heavy 2>&1 | tee step2-ntuple_mc-to-dstst-heavy.log
## Takes 1:56, output is 83GB
./rdx.py Dst_D0-mc-tracker_only-DDX 2>&1 | tee step2-ntuple_mc-to-ddx.log
## Takes 3:29, output is 82GB
./rdx.py Dst_D0-mc-tracker_only-Dstst 2>&1 | tee step2-ntuple_mc-to-dstst.log
## Takes 0:40, output is 9.7GB
./rdx.py Dst_D0-std 2>&1 | tee step2-ntuple_data.log
## Takes 1:56, output is 27GB
./rdx.py Dst_D0-mu_misid 2>&1 | tee step2-ntuple_mu_misid.log
After the ntuple generation, ntuples are moved to rdx-run2-analysis
with something like
mkdir ../../rdx-run2-analysis/ntuples/<folder>
mv ../gen/<date>*/ntuple_merged/* ../../rdx-run2-analysis/ntuples/<folder>
This generation relies on various auxiliary ntuples and weights. Some aux ntuples need to be generated prior to running the above commands. Namely:
B
occupancy/kinematic MC correction weights (fromB -> J/psi K
events)--described inrun2-JpsiK/README.md
--are stored inrun2-rdx/reweight/JpsiK/root-run2-JpsiK
- Long track reco eff MC correction weights (from
J/psi -> mu mu
events)--described a bit more in this comment; makes use of LHCb's TrackCalib package--are stored inrun2-rdx/reweight/tracking/root-run2-general
- PID weights to implement the PID cuts (
DLLK
,DLLmu
,DLLe
,isMuon
,uBDT
) and skim PID selections (NNK
,NNghost
) present in data for our tracker-only MC--makes use of LHCb's PIDCalib (we also have a local fork to incorporateuBDT
); generated with these shell scripts for mu PID, K/pi PID, skim sel PID and with all efficiencies shifted positive--are stored inrun2-rdx/reweight/pid/root-run2-rdx_oldcut-shifted
- Vertex smearing weights to compensate for the incomplete MC final reweighting of vertex resolution (smears the
B
flight vector according to data-driven corrections)--currently run1 corrections used, stored inrun2-rdx/reweight/vertex/smearing_vec.root
(weights calculated in ourvertex-resolution
repo) - misID efficiencies and DiF smearing weights, used in misID unfolding (calculated in and then applied using a script in our
misid-unfold
repo) are stored inrun2-rdx/reweight/misid/histos
The other auxiliary ntuples are calculated on the fly if not cached:
- Form-factor weights, calculated in
Hammer
(via code in ourhammer-reweight
repo) and applied to signal, normalization, andD**(s)
- Trigger emulation weights to implement
L0Hadron TOS
,L0Global TIS
,HLT1
triggers for our tracker-only MC, calculated in ourTrackerOnlyEmu
repo
The step-2 ntuples (outputted to ntuple_merged
folders) can then be copied to rdx-run2-analysis/ntuples
and annexed, and will be used in that
repository to produce the fit templates and other studies.
MC weights are saved in histograms that we store
in run2-rdx/reweight/pid/root-run2-rdx_oldcut-shifted
. These histograms
are calculated with the pidcalib2
package. We have three sets of scripts
pidcalib2/efficiency_gen/rdx-run2-ubdt.sh
for the muon PID, that is to be run inglacier
and takes 15 min to run.lhcb-ntuples-gen/reweight/pid/run2-rdx_oldcut.sh
for the kaon and pion PID, run inlxplus
and takes 50 min to run.misid-unfold/spec/rdx-run2.yml
for the misID unfold species.
If you want add new weights, you should calculate the histogram, copy it to that folder, and include
a branch by modifying run2-rdx/reweight/pid/run2-rdx_oldcut.yml
.