A study from Cornell researchers could enable a quantum leap forward in identifying and deciphering cancer-driving genetic mutations, the first step in developing effective therapeutics.
Cells become cancerous when they develop genetic mutations that drive uncontrolled growth. They then can replicate quickly and accumulate many more mutations. For researchers hoping to combat cancer, it's critical to differentiate between the mutations that are driving disease and the mutations that are just riding along.
The study, published Jan. 24 in Nature Communications, describes a comprehensive framework for analyzing such mutations. Called NetFlow3D, the framework has been applied to 33 different cancer types, leveraging the 3D structures of every human protein and all known interactions between proteins to decipher how mutations drive cancer.
"DNA transcribes into RNA and then translates into proteins. When mutations occur, the protein may fold differently, and that will affect its function," said co-lead author Yingying Zhang, Ph.D. '24, a former member of the lab of Haiyuan Yu, the Tisch University Professor of Computational Biology in the College of Agriculture and Life Sciences.
"But beyond that, these proteins also interact with one another, forming a network," said Zhang, now a postdoctoral researcher at Princeton University. "If a driver mutation disrupts a protein, it will propagate its effect among other proteins. We wanted to map these multi-scale functional effects to understand their mechanisms and their potential role in cancer."
Analyzing 3D clusters within protein structures is a well-established method to differentiate cancer-driving mutations from ride-alongs, but NetFlow3D is the first framework to incorporate the 3D structures of every known protein and protein-protein interaction in humans, Zhang said.
NetFlow3D is built upon the decadeslong efforts of governmental, academic and industry scientists from multiple disciplines. Genetic and molecular mapping of 33 different cancer types comes from The Cancer Genome Atlas Program, an initiative of the U.S. government's National Institutes of Health. Cambridge University researchers in the 1950s were the first to develop 3D models of proteins - a breakthrough that earned them a Nobel Prize in chemistry.
In the following decades, researchers painstakingly discovered the 3D structure of additional proteins but have still only described roughly one-third of all 20,000 known human proteins. That bottleneck began to burst in 2021, Zhang said, when Google DeepMind's AlphaFold 2 and other deep learning algorithms were developed, making it possible to predict the 3D structures of nearly all human proteins and their complexes.
"Combining the complementary insights of these fields, over this huge wealth of data, enabled us to build a comprehensive framework that reveals mutation patterns across proteins, their interactions and biological pathways," she said. "If we can really clearly map the mechanisms of all mutations we have observed in cancer, then in the future, we can improve precision, personalized medicine. "
The other co-lead author is Alden Leung, postdoctoral researcher in the Weill Institute for Cell and Molecular Biology. Also contributing was James Booth, professor and chair of the Department of Statistics and Data Science in the Ann S. Bowers College of Computing and Information Science.
The study was supported with funding from the National Institutes of Health and the Simons Foundation.
Krisy Gashler is a writer for the College of Agriculture and Life Sciences.