When chemists design new chemical reactions, one useful piece of information involves the reaction's transition state - the point of no return from which a reaction must proceed.
This information allows chemists to try to produce the right conditions that will allow the desired reaction to occur. However, current methods for predicting the transition state and the path that a chemical reaction will take are complicated and require a huge amount of computational power.
MIT researchers have now developed a machine-learning model that can make these predictions in less than a second, with high accuracy. Their model could make it easier for chemists to design chemical reactions that could generate a variety of useful compounds, such as pharmaceuticals or fuels.
"We'd like to be able to ultimately design processes to take abundant natural resources and turn them into molecules that we need, such as materials and therapeutic drugs. Computational chemistry is really important for figuring out how to design more sustainable processes to get us from reactants to products," says Heather Kulik, the Lammot du Pont Professor of Chemical Engineering, a professor of chemistry, and the senior author of the new study.
Former MIT graduate student Chenru Duan PhD '22, who is now at Deep Principle; former Georgia Tech graduate student Guan-Horng Liu, who is now at Meta; and Cornell University graduate student Yuanqi Du are the lead authors of the paper, which appears today in Nature Machine Intelligence.
Better estimates
For any given chemical reaction to occur, it must go through a transition state, which takes place when it reaches the energy threshold needed for the reaction to proceed. These transition states are so fleeting that they're nearly impossible to observe experimentally.
As an alternative, researchers can calculate the structures of transition states using techniques based on quantum chemistry. However, that process requires a great deal of computing power and can take hours or days to calculate a single transition state.
"Ideally, we'd like to be able to use computational chemistry to design more sustainable processes, but this computation in itself is a huge use of energy and resources in finding these transition states," Kulik says.
In 2023, Kulik, Duan, and others reported on a machine-learning strategy that they developed to predict the transition states of reactions. This strategy is faster than using quantum chemistry techniques, but still slower than what would be ideal because it requires the model to generate about 40 structures, then run those predictions through a "confidence model" to predict which states were most likely to occur.
One reason why that model needs to be run so many times is that it uses randomly generated guesses for the starting point of the transition state structure, then performs dozens of calculations until it reaches its final, best guess. These randomly generated starting points may be very far from the actual transition state, which is why so many steps are needed.
The researchers' new model, React-OT, described in the Nature Machine Intelligence paper, uses a different strategy. In this work, the researchers trained their model to begin from an estimate of the transition state generated by linear interpolation - a technique that estimates each atom's position by moving it halfway between its position in the reactants and in the products, in three-dimensional space.
"A linear guess is a good starting point for approximating where that transition state will end up," Kulik says. "What the model's doing is starting from a much better initial guess than just a completely random guess, as in the prior work."
Because of this, it takes the model fewer steps and less time to generate a prediction. In the new study, the researchers showed that their model could make predictions with only about five steps, taking about 0.4 seconds. These predictions don't need to be fed through a confidence model, and they are about 25 percent more accurate than the predictions generated by the previous model.
"That really makes React-OT a practical model that we can directly integrate to the existing computational workflow in high-throughput screening to generate optimal transition state structures," Duan says.
"A wide array of chemistry"
To create React-OT, the researchers trained it on the same dataset that they used to train their older model. These data contain structures of reactants, products, and transition states, calculated using quantum chemistry methods, for 9,000 different chemical reactions, mostly involving small organic or inorganic molecules.
Once trained, the model performed well on other reactions from this set, which had been held out of the training data. It also performed well on other types of reactions that it hadn't been trained on, and could make accurate predictions involving reactions with larger reactants, which often have side chains that aren't directly involved in the reaction.
"This is important because there are a lot of polymerization reactions where you have a big macromolecule, but the reaction is occurring in just one part. Having a model that generalizes across different system sizes means that it can tackle a wide array of chemistry," Kulik says.
The researchers are now working on training the model so that it can predict transition states for reactions between molecules that include additional elements, including sulfur, phosphorus, chlorine, silicon, and lithium.
"To quickly predict transition state structures is key to all chemical understanding," says Markus Reiher, a professor of theoretical chemistry at ETH Zurich, who was not involved in the study. "The new approach presented in the paper could very much accelerate our search and optimization processes, bringing us faster to our final result. As a consequence, also less energy will be consumed in these high-performance computing campaigns. Any progress that accelerates this optimization benefits all sorts of computational chemical research."
The MIT team hopes that other scientists will make use of their approach in designing their own reactions, and have created an app for that purpose .
"Whenever you have a reactant and product, you can put them into the model and it will generate the transition state, from which you can estimate the energy barrier of your intended reaction, and see how likely it is to occur," Duan says.
The research was funded by the U.S. Army Research Office, the U.S. Department of Defense Basic Research Office, the U.S. Air Force Office of Scientific Research, the National Science Foundation, and the U.S. Office of Naval Research.