Exploring Known and Unknown Microbial Space

DOE/Lawrence Berkeley National Laboratory

in Science Advances, researchers at the U.S. Department of Energy (DOE) Joint Genome Institute (JGI), a DOE Office of Science User Facility located at Berkeley Laboratory, have taken stock of the current state of microbial genomic biodiversity. Using publicly available genome sequence data generated over the past three decades, their study assesses what fraction of the microbial diversity we know about, and proposes a path forward to curate and cultivate what is still unknown.

"We took a deep dive into over 1.8 million bacterial and archaeal genomes to see how much of their diversity we've actually captured," said co-first author Dongying Wu, a member of the Functional Annotation team within the JGI's Microbiome Data Science group. "Turns out that despite all the genomes we've sequenced, we've only scratched the surface."

The study, added co-first author Rekha Seshadri, is a wake-up call to revive the art of hands-on microbiology and experimental validation.

Microbes run the world, playing key roles in regulating global nutrient cycles. The information gained from understanding the interactions between microbes, hosts, and their environments could be applied to multiple research areas including agriculture, biofuels and bioproducts, and medicine. Wading through 30 years of sequencing data, the team composed of Wu, Seshadri, Nikos Kyrpides and Natalia Ivanova conducted a census of publicly available bacterial and archaeal sequences. Kyrpides heads the JGI's Microbiome Data Science group, and Ivanova leads the Functional Annotation team.

MAGs significantly expanded the estimated diversity of Bacteria and Archaea

With phylogenetic diversity standing in as a representative measure of biodiversity, they used five universally conserved, protein-coding marker genes on nearly 2 million genomes, including isolates, metagenome-assembled genomes (MAGs) with varying quality scores representing potential genomes, and nearly 44,000 metagenomes. All of the data are publicly available in the National Center for Biotechnology Information (NCBI) GenBank collection and on the JGI's Integrated Microbial Genomes & Microbiomes (IMG/M) database.

In their analysis, the team found that bacterial isolate genomes represent 9.73% of the total estimated diversity of the available datasets. Efforts to recover MAGs by extracting data directly from environmental samples over the years have significantly expanded the known diversity of microbial genomes, accounting for nearly 49% of the total estimated bacterial diversity. The team conservatively estimated that about 42% of the bacterial diversity has no genomic representation in the public databases.

Over the past few decades, advancements in sequencing technologies have led to an abundance of microbial genomes becoming available to the global research community.In a similar analysis for Archaea, the team found that isolate genomes represent only 6.55% while MAGs account for about 57% of the estimated diversity of the available datasets. This leaves 36% of archaeal diversity with no genomic representation.

"When it comes to isolate genome data, our real-world touchstones, we're just scratching the proverbial Petri dish," Seshadri said. "It's a reminder of the urgency to cultivate new microbial species."

Targeted Explorations for Understanding the Microbial World

Corresponding author Ivanova said that the JGI continues to work along the lines of increasing genomic representation of bacterial and archaeal diversity, shedding light on the unknown microbial space. While MAGs significantly expanded the known diversity of microbial genomes in the datasets, however, the team added that this information is still computationally derived. Experimental studies on cultivated isolates are needed to convert the potential implications into applied science, contributing toward a sustainable bioeconomy.

"While computationally derived MAGs have been a revolutionary tool for microbiology, it's a call to balance," added Seshadri. She noted that the metagenomic datasets used in this study could help researchers improve their chances of recovering specific isolate species for culture. "We've drawn out the treasure map," she said. "Basically we can point specifically to environmental samples where people can go and reinvest time and effort in recovery."

The team plans to convene a discussion later this year related to efforts to increase the number of cultured isolates. If you're interested in learning more, fill out this survey .

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.