Enzymes are crucial to life. They are nature's little catalysts. In the gut, they help us digest food. They can enhance perfumes or get laundry cleaner with less energy. Enzymes also make potent drugs to treat disease. Scientists naturally are eager to create new enzymes. They imagine them doing everything from drawing greenhouse gases out of the skies to degrading harmful toxins in the environment.
That age-old quest for new enzymes just got a whole lot easier. A team of bioengineers and synthetic biologists has developed a computational workflow that can design thousands of new enzymes, predict how they will behave in the real world, and test their performance across multiple chemical reactions – a workflow that takes place on a computer. Their results are published in a new paper in the journal Nature Communications.
"We've developed a computational process that allows us to engineer enzymes much faster, because we don't have to use living cells to produce the enzymes, as is now the case," said Michael Jewett, a professor of bioengineering at Stanford University and senior author of the new study. "Instead, we use machine learning to predict highly active designer enzymes that have been engineered from mutated DNA sequences modeled on the computer instead of created by hand in the lab. We can carry out these experiments in days rather than weeks or, as is often the case, months."
Old science, new models
Historically, scientists working to engineer new enzymes had to start with an enzyme already known to nature. Then, using real, genetically modified cells in the lab, they iteratively make changes to the enzymes to coax them to carry out the desired chemistry the researchers hope to achieve.
The DNA needed for these enzyme variants must be purchased from a third-party vendor. The DNA must then be transferred manually into cells to produce the enzymes of interest, which then must be purified and tested across a range of chemical reactions. Sometimes, Jewett said, it can take thousands of iterations – perhaps even tens or hundreds of thousands – to try to find a single enzyme that might deliver the chemistry that a scientist is aiming to achieve.
"We can now do all that on a computer," he adds. "Rather than having to run 10,000 chemical reactions to iteratively improve enzyme activity, we can use machine learning models to predict highly active variants that still do just as well."
The science of enzyme engineering is not new, only the application of machine learning to the field. Jewett and colleagues know it as "directed evolution." They are shortcutting the process nature itself has gone through over the ages as DNA mutates by chance and new enzymes result, sometimes with important results. Enzymes are, after all, just proteins made up of long strings of amino acids. DNA directs the production of the strings. Change the DNA; change the enzymes.
"It is the structure of the proteins – which is created from the sequence of those amino acids in the molecule – that leads to their function," Jewett said. "Directed evolution is a decades-old field that has developed the ability to mutate amino acids to change the function of the protein. We're just speeding up the process using machine learning and computers." A key feature of the team's workflow is the ability to synthesize and test protein enzymes in cell-free systems without living intact organisms, which further accelerates the process.
Future-focused
As a proof of concept, Jewett and colleagues used their new tool to synthesize a small-molecule pharmaceutical at 90% yield – up from an initial 10% yield – and show it can be applied to build multiple specialized enzymes in parallel to make eight additional therapeutics. He is now looking for a pharmaceutical partner to further develop the model. More broadly, Jewett's group has interest in expanding his machine learning models to guide catalysis or enzyme function across many different types of chemical reactions. In this paper, the team only looked at amide bond formation, a ubiquitous chemical reaction important in many different areas from pharmaceuticals to foods. But there are other opportunities.
"We could explore multiple opportunities in sustainability and the bioeconomy. You could begin thinking about classes of molecules that degrade toxins from the environment, enhance bioavailability of protein-rich foods, or others that take existing processes that require high pressures, costly components, or toxic reactions and make them faster, safer, and less expensive," Jewett said.
Jewett and colleagues' work was not without its roadblocks, most notably a lack of data. "High-quality, high-quantity functional data remains a challenge," he said. "We all know AI needs lots of data, and at this point it's just not there."
In the context of directed evolution and biocatalysis, generating large amounts of data for carrying out those chemical reactions is not something that is commonly reported in the scientific literature, Jewett said. The process of generating the data is just too slow. But, as science comes to use machine learning models more and more to accelerate design, those data needs will only increase, Jewett said, pointing to future work. In this study, Jewett was ultimately able to assess about 3,000 enzyme mutants across about 1,000 products and about 10,000 chemical reactions, but his data needs are orders of magnitude greater.
"If I wanted to mutate an enzyme to test tens of thousands of variants," Jewett said, providing a concrete example for scale, "I might find papers out there, but they may report mutant data for ten variants. Not hundreds. Not thousands. Not tens of thousands of reactions, but ten. So, we have a way to go on the data front, but we'll get there. This is the first step."
Contributing authors on this work include Grant M. Landwehr, Jonathan W. Bogart, Carol Magalhaes, Eric G. Hammarlund, and Ashty S. Karim at Northwestern University. This work was made possible in part by grants from the NCI Cancer Center, Department of Energy Defense Threat Reduction Agency, National Institutes of Health, and LDRD Program at Sandia National Laboratories.