Protein Shapes Unlock Ancient Life's Mysteries

Center for Genomic Regulation

The three-dimensional shape of a protein can be used to resolve deep, ancient evolutionary relationships in the tree of life, according to a study in Nature Communications.

It is the first time researchers use data from protein shapes and combine it with data from genomic sequences to improve the reliability of evolutionary trees, a critical resource used by the scientific community for understanding the history of life, monitor the spread of pathogens or create new treatments for disease.

Crucially, the approach works even with the predicted structures of proteins that have never been experimentally determined. It has implications for the massive amount of structural data being generated by tools like AlphaFold 2 and help open new windows into the ancient history of life on Earth.

There are 210 thousand experimentally determined protein structures but 250 million known protein sequences. Initiatives like the EarthBioGenome project could generate billions more protein sequences in the next few years. The abundance of data opens the door to applying the approach on an unprecedented scale.

For many decades, biologists have been reconstructing evolution by tracing how species and genes diverge from common ancestors. These phylogenetic or evolutionary trees are traditionally built by comparing DNA or protein sequences and counting the similarities and differences to infer relationships.

However, researchers face a significant hurdle – a problem known as saturation. Over vast timescales, genomic sequences can change so much that they no longer resemble their ancestral forms, erasing signals of shared heritage.

"The issue of saturation dominates phylogeny and represents the main obstacle for the reconstruction of ancient relationships," says Dr. Cedric Notredame, researcher at the Centre for Genomic Regulation (CRG) and lead author of the study. "It's like the erosion of an ancient text. The letters become indistinct, and the message is lost."

To overcome this challenge, the research team turned to the physical structures of proteins. Proteins fold into complex shapes that determine a cell's function. These shapes are more conserved over evolutionary time than the sequences themselves, meaning they change more slowly and retain ancestral features for longer.

The shape of a protein is dictated by its amino acid sequence. While sequences may mutate, the overall structure often remains similar to preserve function. The researchers hypothesised they could gauge how much the structures diverge over time by measuring the distance between pairs of amino acids within a protein, also known as intra-molecular distances (IMDs).

The study compiled a massive dataset of proteins with known structures, covering a wide range of species. They calculated the IMDs for each protein and used these measurements to construct phylogenetic trees.

They found that trees built from structural data closely matched those derived from genetic sequences, but with a crucial advantage: the structural trees were less affected by saturation. This means they retained reliable signals even when genetic sequences had diverged significantly.

Recognising that both sequences and structures offer valuable insights, the team developed a combined approach which not only improved the reliability of the tree branches but also helped distinguish between correct and incorrect relationships.

"It's akin to having two witnesses describe an event from different angles," explains Dr. Leila Mansouri, coauthor of the study. "Each provides unique details, but together they give a fuller, more accurate account."

One practical example where the combined approach could make a significant impact is in understanding the relationships among kinases in the human genome. Kinases are proteins involved in many different important cellular functions.

"The genome of most mammals, including humans, contains about 500 protein kinases that regulate most aspects of our biology," says Dr. Notredame. "These kinases are major targets for cancer therapy, for example drugs like imatinib for humans or toceranib for dogs."

Human kinases have arisen through duplications occurring over the last billion years. "Within the human genome, the most distantly related kinases are about a billion years apart," says Dr. Notredame. "They duplicated in the common ancestor of the common ancestor of our common ancestor."

This vast timescale involved makes it incredibly difficult to build accurate gene trees that show how all these kinases are related. "Yet, as imperfect as it may be, the kinase evolutionary tree is widely used to understand how it interacts with other drugs. Improving this tree, or improving trees of other important protein families, would be an important advance for human health," adds Dr. Notredame.

The potential applications of the work go beyond cancer. Using the approach to create more accurate evolutionary trees could also improve our understanding of how diseases evolve more generally, aiding in the development of vaccines and treatments. They can also help shed light on the origins of complex traits, guide the discovery of new enzymes for biotechnology, and even help trace the spread of species in response to climate change.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.