At CERN, we are used to dealing with a deluge of data, but the numerical value behind the word "deluge" has significantly evolved over the years. Today, the average amount of collision data recorded on disk by the LHC experiments is a little under 3 petabytes (PB) per day, almost equal to the amount that was recorded in one month during Run 1. So far, the current 2024 proton run has written about 200 PB on disk. This figure is not far from the 204 PB of data recorded during the whole of Run 2 (2015-2 018). Data is set to crescendo in future runs and with future accelerators; the trend is already very clear but the details will evolve over the years.
In 2020, it became clear that with the Meyrin Data Centre alone, CERN would not be able to cope with all the data produced by the LHC experiments. The only viable solution to increase storage and processing capacity was to build a new data centre: the Prévessin Data Centre, inaugurated at the beginning of this year.
Fundamental differences exist between the two infrastructures: "The Meyrin Data Centre is highly robust and is fully covered by uninterruptible power supply (UPS) systems, but it can only support low power density in the racks," explains Wayne Salter, Head of the Fabric group in the IT department, in charge of the procurement, management and operation of the data centres. "This means that it is better suited for low density equipment that needs guaranteed power, such as storage. In the new Prévessin Data Centre, where we don't have full UPS coverage, we are concentrating the processing capacity - less vulnerable to power cuts - which is entirely made available to the Worldwide LHC Computing Grid (WLCG)."
All data produced at CERN still passes through the Meyrin Data Centre. It is the only place connected to all the experimental sites via the ultra-fast network of optical fibres. The data collection, copy and dispatching still happen here.
But beyond the data centres, the general upgrade of the IT services at CERN is a vital ongoing process. As part of this process, the Meyrin Data Centre Console service was retired at the end of March. Since then, both data centres have been monitored remotely with critical functions being handled by the CERN Control Centre (CCC) operators.
Hands-on, round-the-clock interventions are no longer required in the data centres thanks to a combination of resilient, and in some cases, auto-remediating, IT services. "From the 1960s to recently, console operators held a critical role in data storage, processing and the operation of computing equipment. They managed expansive tape libraries, a custodian of data storage," explains Olof Bärring, Head of the Fabric group's Infrastructure and Operations section, responsible for the operation of the data centres. "The manual operations and highly specialised skills of the console operators have underpinned the success of computing at CERN for decades. Their contribution has been invaluable and will go down in CERN history."
What other revelations does the future hold? How will CERN cope with the growing data production rate of future experiments? "It is probably too early to predict the exact volume of data that future accelerators will produce and how we will deal with it, but we know that big challenges lie ahead and only careful, early planning will allow our scientific community to keep up with them", replies Salter. To meet the needs of High-Luminosity LHC, the IT department already foresees an expansion of the Prévessin Data Centre in 2027, making more space and power available, as well as the continued monitoring of the growing needs for the Meyrin Data Centre.