The Large Hadron Collider Beauty (LHCb) experiment at CERN is the world's leading experiment in quark flavour physics with a broad particle physics programme. Its data from Runs 1 and 2 of the Large Hadron Collider (LHC) has so far been used for over 600 scientific publications, including a number of significant discoveries. While all scientific results from the LHCb collaboration are already publicly available through open access papers, the data used by the researchers to produce these results is now accessible to anyone in the world through the CERN open data portal. The data release is made in the context of CERN's Open Science Policy, reflecting the values of transparency and international collaboration enshrined in the CERN Convention for more than 60 years.
"The data collected at LHCb is a unique legacy to humanity, especially since no other experiment covers the region LHCb looks at," says Sebastian Neubert, leader of the LHCb open data project. "It has been obtained through a huge international collaborative effort, which was funded by the public. Therefore the data belongs to society."
The data sample made available amounts to 20% of the total data set collected by the LHCb experiment in 2011 and 2012 during LHC Run 1. It comprises 200 terabytes containing information obtained from proton-proton collision events filtered and recorded with the detector. The LHCb collaboration has preprocessed the data by reconstructing experimental signatures, such as the trajectories of charged particles, from the raw information delivered by its complex detector system. The data is filtered, classified according to approximately 300 processes and decays, and made available in the same format as that used by LHCb physicists.
The analysis of LHC data is a complex and time-consuming exercise. Therefore, to facilitate the analysis, the samples are accompanied by extensive documentation and metadata, as well as a glossary explaining several hundred special terms used in the preprocessing. The data can be analysed using dedicated LHCb algorithms, which are available as open source software.
The data is suitable for different types of physics studies and can be directly downloaded by anyone. "It is intended to be used by professional scientists and its interpretation needs some knowledge of particle physics, but everybody is invited to give it a try," continues Neubert. "It would be great if the data inspires new research directions and is used by researchers in other fields, such as data science and artificial intelligence. We are eager to hear from users of the data what they find."