Alongside the newly updated human genome, which fills in long-standing gaps to fully spell out the more than 3 billion letters that compose our genetic code, a separate companion study has shown it can serve as an accurate template that improves our DNA sequencing capabilities by leaps and bounds.
A group within the Telomere-to-Telomere (T2T) consortium - the initiative that completed the genome - led by the National Institute of Standards and Technology (NIST), Johns Hopkins University and the University of California, Davis, tested the full genome's ability to support the sequencing of DNA from thousands of people. In a new paper published in the journal Science, the researchers found that it corrected tens of thousands of errors produced by the previous rendition of the genome and was better for the analysis of more than 200 genes of medical relevance. The findings suggest that the T2T's genome could greatly propel research into genetic disorders, and that further in the future, patients might reap the benefits of more reliable diagnoses.
When clinicians and researchers sequence DNA to study or diagnose a genetic disorder, they use machines that produce strings of DNA, each mirroring a section of a patient's or research subject's genome. Then they compare those strings to a template, called a reference genome, to get an idea of what order to place them in.
"If sequencing DNA is like putting together a puzzle, then the reference genome is like the picture of the finished puzzle on the box. It helps guide you in putting together the pieces," said NIST biomedical engineer Justin Zook, a co-author of the study.
The most advanced reference genome prior to the T2T version lacks 8% of the genome, and certain sections, which have proved difficult for sequencing technologies to decode in the past, are riddled with errors.
These imperfections made the reference akin to a puzzle box picture having blanks and showing pieces in the wrong place. But thanks to technological and scientific advances made in genomics over the past two decades, the T2T consortium was able to fill in and clean up the human reference genome.
Zook and the other study authors aimed to show just how much of a difference the finished reference would make in DNA sequencing.
The team found a proving ground for the reference in the 1000 Genomes Project (1KGP), an international effort that has amassed genetically diverse genome sequences from thousands of people from four different continents. Rather than starting from scratch and obtaining DNA from new subjects, the researchers were able to piece together the DNA segments already laid out by 1KGP.
The authors used computer programs to analyze 3,202 genomes with the T2T reference and compared the results to published work on these genomes that was performed with the previous reference. It became clear that genomes stitched together using one of the two references differed greatly in important regions.
The T2T reference genome brought millions of genetic variations - stretches of DNA that differ from person to person - to light that the other reference did not. And it also washed away tens of thousands of blemishes in sequences, such as incorrectly located variations. In other words, the new variations filled in the blanks on the puzzle box picture and the corrections showed the right puzzle pieces where thousands were out of place before.
"What we found is that this new reference improved accuracy across the board. So, regardless of what the ancestry of the individual was, whether they were African, Caucasian or Asian, the new reference improved results for them," Zook said.
To understand the new reference's capabilities more thoroughly, the researchers attempted to use it to identify variation in 269 genes with either known or suspected connections to disease. These genes are tucked away in the regions of the genome that were previously challenging to decipher accurately.
The authors narrowed their focus to just one person characterized extensively by the NIST-led Genome in a Bottle Consortium, rather than thousands, to conduct this test. They performed a rigorous analysis of the genome of this individual, who had consented to publicizing their genetic code, using an array of powerful sequencing technologies backed by the new reference, Zook said.
For their efforts, they obtained a genomic benchmark - a highly accurate digital readout of the DNA in genes of interest - that can act as an answer key when evaluating sequencing methods.
The team paired the references with three different sequencing technologies each. But no matter the approach, T2T's genome always outperformed its predecessor, even decreasing error by as much as 12 times with one technology.
The T2T reference genome rounds out the mapping of our genetic blueprint, marking a pivotal milestone in the field of genomics. Researchers across the field will now be able to explore areas in the genome that were off limits in the past and begin to understand how scores of genes relate to different diseases. But according to Zook, there is still more work to do before clinics put it into practice.
By all indications thus far, the T2T reference is more accurate than the current reference. However, researchers have used the current reference to analyze millions of genomes, gaining a deep well of knowledge that is essential for properly interpreting results when using it. Experts will need to grasp the ins and outs of the new reference in the same way to move forward.
"I think there'll definitely be a lot more work to understand the accuracy of DNA sequences of many individuals in regions of the genome that this reference now makes accessible," Zook said.
Paper: Sergey Aganezov, Stephanie M. Yan, Daniela C. Soto, Melanie Kirsche, Samantha Zarate, Pavel Avdeyev, Dylan J. Taylor, Kishwar Shafin, Alaina Shumate, Chunlin Xiao, Justin Wagner, Jennifer McDaniel, Nathan D. Olson, Michael E.G. Sauria, Mitchell R. Vollger, Arang Rhie, Melissa Meredith, Skylar Martin, Joyce Lee, Sergey Koren, Jeffrey A. Rosenfeld, Benedict Paten, Ryan Layer, Chen-Shan Chin, Fritz J. Sedlazeck, Nancy F. Hansen, Danny E. Miller, Adam M. Phillippy, Karen H. Miga, Rajiv C. McCoy, Megan Y. Dennis, Justin M. Zook and Michael C. Schatz. A complete reference genome improves analysis of human genetic variation. Science. Published online March 31, 2022. DOI: 10.1126/science.abl3533