In a groundbreaking development for computational science, a team of National Nuclear Security Administration (NNSA) Tri-Lab researchers has unveiled a revolutionary approach to molecular dynamics (MD) simulations using the Cerebras Wafer-Scale Engine (WSE), the world's largest computer chip.
Running on the second-generation Cerebras WSE-2 - a cutting-edge processor boasting 850,000 cores - the team from Lawrence Livermore National Laboratory (LLNL), Los Alamos National Laboratory (LANL), Sandia National Laboratories and Cerebras Systems demonstrated the chip can perform complex simulations involving hundreds of thousands of atoms at speeds previously thought unattainable. As described in a recent paper, the team achieved simulation timesteps at rates more than 450 times over other exascale systems such as Oak Ridge National Laboratory's Frontier. The work is a finalist for the 2024 Association for Computing Machinery (ACM) Gordon Bell Prize, the highest honor in supercomputing.
Molecular dynamics simulations are critical for understanding the behavior of materials at the atomic level, driving advancements in fields such as materials science, biophysics and drug design. MD simulations are of particular interest to the NNSA labs, where they are essential for exploring how materials behave when experiments are either too expensive or are otherwise unable to reach relevant conditions such as temperature, pressure and time- and length-scales, researchers said.
Traditional supercomputers have struggled with the long runtimes required for large-scale simulations, often taking weeks or months to complete. But unlike traditional supercomputers that rely on distributed memory systems, the WSE is a monolithic processor, constructed from a single silicon wafer. This unique design allows for its hundreds of thousands of cores to operate in parallel, enabling the simulation of large atomic systems with remarkable efficiency.
By dedicating a processor core to each simulated atom, the WSE-2 achieved a staggering 457-fold improvement in timesteps per second compared to Frontier, the current leading GPU-based exascale platform. In their tests, the team achieved a simulation rate of over 699,000 timesteps per second for problems involving 800,000 tantalum atoms, setting a new standard for general-purpose processing cores and for what is possible in the realm of molecular dynamics.
"The beauty of the Cerebras system is that because it's a general-purpose processor, it's not specialized to a particular algorithm, so you can use it for all different types of HPC simulations," said LLNL co-author and computer scientist Edgar Leon. "The number we were able to get with the molecular dynamics simulations broke a record in terms of what you could do today with the world's leading supercomputer."
Unleashing new frontiers in materials science
The team's results suggest that MD simulations that would have taken months or years to complete on today's most powerful supercomputers could be finished in a matter of days, allowing scientists to explore atomic interactions with remarkable efficiency and accuracy. For example, the team demonstrated the WSE can simulate 300 microseconds of atomic interactions in just one day, a feat that could impact future materials design and open new avenues in materials science and other atomistic-level challenges like drug design and complex biological processes.
"In many types of simulations, including MD, progress is made by repeatedly predicting where positions or values will be a short time in the future and using the predicted values to advance simulation time by a small timestep," explained co-author Tomas Opplestrup, formerly an LLNL computer scientist and now at Cerebras. "It is unusual to see rates of even thousands of timesteps per second, and here we achieved hundreds of thousands."
Oppelstrup explained that while computers specially designed to do only molecular dynamics of biological systems have achieved rates of nearly a million steps per second, "we have achieved close to the same rates on a more generally programmable computer, suggesting that these kinds of rates can be achieved for other types of simulations beyond materials science, and even molecular dynamics."
With the ability to simulate and probe timescales two to three orders of magnitude longer than possible before, researchers can begin to explore phenomena that were previously out of reach on current supercomputers. The ability to conduct high-fidelity simulations of complex systems could accelerate discoveries in various fields, including drug design and biophysics, where understanding protein folding and interactions at the atomic level can lead to breakthroughs in developing new therapies for diseases. Improved understanding of the behavior of materials under extreme conditions also could lead to the development of more advanced, reliable materials for more efficient energy systems like fusion reactors, according to Leon.
"With this capability, you could actually move from nanoseconds of time to microseconds or even milliseconds of time to try to understand a particular process within that material simulation," Leon said. "That's game-changing."
The work was conducted under the NNSA's Advanced Simulation and Computing (ASC) program's Advanced Memory Technology (AMT) project, which aims to sustain technology research and development momentum in the post-exascale world through industry engagement. The AMT project is targeting performance improvements in NNSA applications by more than 40 times over exascale class systems in the next five years. Molecular dynamics is one of those applications, and the first with a complete code ready for performance measurement and publication, which is why it was chosen for study first, according to Oppelstrup.
According to the paper, the WSE's architecture also allows for improved energy efficiency, achieving one to two orders of magnitude better performance compared to traditional platforms. This efficiency is crucial as the scientific community increasingly seeks sustainable solutions to global challenges, such as climate change and energy production, researchers said.
While describing their results with the WSE-2 as "exciting," Oppelstrup and other team members cautioned it doesn't mean the chip is faster overall than exascale systems like Frontier, but for certain smaller problems that only require one chip, they could be a preferred route.
"The aggregate performance of today's exascale machines like Frontier, or [LLNL's] El Capitan is still much higher than a single - or even many - Cerebras machines," Oppelstrup said. "The Cerebras hardware shines for applications where many timesteps or iterations are required, and current machines cannot reach the required timescales. Besides porting applications to this new hardware, Cerebras machines can play an important role in adding more machine learning capabilities to current supercomputers."
Life beyond Moore's Law
With further MD code development and continuation of the NNSA Tri-Labs/Cerebras collaboration, Oppelstrup said he expects many more mission applications may be ported to the Cerebras architecture and result in extraordinary speedups. The team also anticipates expanding the current work to include detailed and accurate atomic interaction models, and to combine MD simulation with machine learning, at which the WSE also excels. This adaptability could open the door for even more sophisticated simulations, further enhancing understanding of complex materials and biological systems.
LLNL's Leon said the team wants to continue exploring the promise of the WSE architecture, programming it to run other types of simulations, like Monte Carlo or hydrodynamics simulations, which will require additional programming for the chip.
In the future, he added, supercomputing systems may regularly have different types of accelerators or nodes linked to them, and new architectures like the Cerebras WSE could improve performance on certain problems at a much more rapid pace. Successful scaling to facility-level machine deployments could also lead to an "even greater paradigm shift in the Top500 [world's fastest] supercomputer list than that introduced by the GPU revolution," researchers said in the paper.
Being a new type of computing system, development tools will still need to mature, and more optimization will be needed for efficient and accurate simulation codes, requiring the collaborative efforts of computer scientists, computer architecture specialists and algorithm specialists working together to get there, Leon said.
"With Moore's law slowing down, you're not going to see 1,000x [performance improvements] over exascale anymore. But if we want any sort of meaningful improvements, like a 40-times scale, we're going to be looking at architectures like Cerebras, which has demonstrated a potential to do this," Leon said. "There are a lot of challenges to address, but this [paper] demonstrates we can get there rapidly. It won't be 1000 times, but 100 times can be within reach for certain workloads. With Sandia, Los Alamos, Livermore national labs and Cerebras working together toward this lofty goal, I think we can get there fairly quickly."
Sponsored annually by the ACM, the Gordon Bell Prize recognizes outstanding achievement in high performance computing. The winner will be announced at the 2024 Supercomputing Conference (SC24), the premier conference for high performance computing, on Nov. 21 in Atlanta.