EMBL-EBI Unveils Open Data for Biodiversity, Climate

EMBL-EBI data resources help advance biodiversity and climate change research by enabling scientists to study species interactions, evolutionary processes, ecosystem health, and more

EMBL-EBI's open data resources for biodiversity and climate research. Image credit: Karen Arnott/EMBL-EBI

Three-quarters of Earth's land-based environment and about 66% of its marine environment have already been significantly altered by human actions. In the face of climate change and accelerating biodiversity loss, open access to data has become vital for advancing scientific research and developing sustainable solutions for the future of our planet.

At EMBL's European Bioinformatics Institute (EMBL-EBI), we host a range of data resources that support biodiversity and climate change research. These resources enable scientists to explore and analyse data from all species, which, in turn, helps researchers understand their genetic diversity, study different ecosystems, and inform conservation strategies. Here, we look at some key resources and how they contribute to biodiversity and climate change research.

The Biodiversity Portal and related data portals

EMBL-EBI's Biodiversity Portal is the first port of call for scientists who want to access biomolecular data for their biodiversity research. The portal was created as part of EMBL's Planetary Biology transversal theme. It pulls together information from different data resources and serves as a central hub for the genomic data from various global biodiversity initiatives.

One of the portal's key features is its ability to track sequencing progress and provide status updates for various species. It also includes interactive tools, such as sampling maps and phylogeny browsers, that help scientists visualise the relationships and geographic distributions of different organisms.

MGnify

Microbial communities, or microbiomes, are essential to the health of ecosystems. They play key roles in nutrient cycling, decomposition, and the health of plants and animals. MGnify, our microbiome data resource, makes data, tools, and pipelines freely available to help researchers understand the interactions and functions of microbes in various environments and ecosystems.

Researchers can use MGnify to compare microbial communities across different habitats, study the effects of pollutants, and even explore potential biotechnological applications, such as bioremediation and sustainable agriculture. MGnify also collects, stores, and makes data available from different environmental projects including BlueRemediomics and AtlantECO to aid the study of marine microbiomes and their contribution to ocean health. MGnify is also a major contributor of data for the Global Biodiversity Information Facility (GBIF).

Ensembl Rapid Release

Ensembl is our genomic data resource for high-quality genome annotations, tools, and services for vertebrates and model organisms. Ensembl Rapid Release, part of Ensembl, is designed to provide quick access to the latest genome sequences. This platform is tailored to meet the growing demands of biodiversity research, offering scientists immediate access to newly sequenced genomes from initiatives including the EarthBiogenome Project (EBP), Darwin Tree of Life (DToL), the European Reference Genome Atlas, and many more.

Unlike the main Ensembl genome browser, which updates every three months, Rapid Release updates every two weeks. This rapid availability is essential for biodiversity projects that aim to catalogue and understand species before they potentially become extinct. Ensembl also hosts data from different biodiversity initiatives on project pages where users can find datasets and analysis tools to support their research.

UniProt

UniProt provides information about protein sequences and their functions. This resource supports biodiversity research by helping scientists to explore the structures and functions of proteins from countless species, gaining insights into their biology and evolution.

UniProt's taxonomy database is also beneficial for biodiversity research. It helps scientists navigate complex relationships between species and track evolutionary lineages. By linking protein data to specific organisms, researchers can investigate how proteins vary between species and how these differences contribute to unique characteristics and adaptations.

European Nucleotide Archive

The European Nucleotide Archive (ENA) acts as a comprehensive repository for nucleotide sequence data. This open access archive supports biodiversity research by aiding the storage, access, and analysis of nucleotide sequences from global biodiversity projects such as the EBP, DToL, AtlantECO, and other initiatives.

The database also commits to data quality and standardisation. By logging detailed metadata for each submission, ENA ensures that all biodiversity data is well-documented and easily searchable. This adherence to the FAIR principles (Findable, Accessible, Interoperable, Reusable) increases the reusability of these data, facilitating reproducible and collaborative biodiversity research. The data at ENA is increasingly connected with other biodiversity data resources, including taxonomic and specimens databases, publishers, and biodiversity data aggregators such as GBIF.

Protein Data Bank in Europe (PDBe)

The Protein Data Bank in Europe (PDBe) collaborates globally to collect, organise, and make available data for the 3D structures of proteins, nucleic acids, and complex assemblies. These data are important for understanding the molecular basis of biological processes and how proteins interact within cells.

This resource is valuable for biodiversity research as it allows scientists to explore protein structures across a wide range of species. Understanding protein structures helps researchers identify how proteins perform their functions, evolve, and adapt to different environmental conditions. By linking structural data to functional and evolutionary information, PDBe supports the study of the molecular mechanisms underlying biodiversity.

AlphaFold Database

The AlphaFold Database contains predicted protein structures generated using Google DeepMind's AlphaFold2 AI algorithm. This AI powered resource has transformed the field of structural biology and has the potential to make a big impact on biodiversity research.

The database contains predictions for nearly every catalogued protein known to science, from a wide range of species, including bacteria, plants, and animals. By providing these structures openly and freely, the AlphaFold Database helps scientists explore the molecular mechanisms that contribute to the evolution of biodiversity.

BioSamples

BioSamples stores and provides detailed descriptions and metadata for biological samples from various sources, including other databases such as ENA. Access to these rich metadata helps researchers deepen their understanding of biological diversity and improve the accuracy and reproducibility of their studies. For biodiversity research, access to these metadata means that scientists can accurately trace the origins of their samples, understand the environmental and experimental conditions associated with their collection, and compare them with other datasets.

EMBL-EBI also manages data portals for large international biodiversity projects:

Online training: Exploring microbial ecosystems

Explore microbial communities and metagenomics with our free on-demand training collection.

Planetary biology at EMBL

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.