The order Primates consists of not only our closest relatives on earth, the seven great apes, but also over 450 species of monkeys, lemurs, lorises, and galagos. Primates are fantastically diverse, from 400-pound gorillas to mouse lemurs (Microcebus) weighing just a single ounce. They exhibit some of the most remarkable behaviors observed in nature; chimpanzees 'fish' for termites in hollow logs using specially selected sticks, while orangutans use leaves as gloves to handle spiky durian fruit.
They are some of the most intensely studied species on Earth, and yet there is no comprehensive molecular phylogenetic hypothesis of primate evolutionary history that summarizes the pattern and timing of all primate relationships. Such a phylogenetic tree would use molecular sequence data to tell us both when each species or group of species first appeared, and which other groups on the tree are their closest relatives. The largest timed molecular phylogenetic tree of life, called a 'timetree', to date includes just over 200 primate species, while the largest synthetic timetree, drawing from over 4,000 published studies, includes barely double that count, leaving about a fifth of the primate tree of life unresolved.
Why we need complete evolutionary trees
The value of timed evolutionary trees containing every species of a given lineage cannot be understated. While such trees are intrinsically compelling, as they capture the evolutionary history which gave us our present biodiversity, they also form essential foundations for many types of future work. For example, taxonomic and systematic efforts to catalog species rely on them to identify new lineages. Studies of the rate of evolution and its possible correlates like climate and geological changes are fundamentally tied to their underlying phylogenies. Fields like biogeography, phylogeography, and historical ecology, which use timetrees to investigate spatial or ecological patterns, would be impossible without a phylogeny. And, as we watch global biodiversity slip away amid ongoing extinction events, phylogenies are essential tools in identifying conservation priorities and assessing the impacts of our efforts to preserve species.
How common are complete phylogenies?
Since comprehensive molecular phylogenies are valuable tools, it may be a surprise to learn that they tend to be rare. The NCBI taxonomy database currently includes molecular sequences for almost 500,000 species, while The TimeTree of Life , the largest database of published timed phylogenies, includes about 150,000 species. In exploring the collection of studies included in the database, we discovered that most phylogenies tend to be small, encompassing only 25 species on average. These trees are the efforts of people dedicated to studying groups of closely related organisms like genera or families who prioritize resolution in their study system over broader scale. Thus, the need for a complete tree of life will only be met if we can find a way to bring together these efforts.
A new way forward
While large, fully timed trees with molecular sequence data for all species are rare, we have found that the materials to build them are common. For one, untimed phylogenies greatly outnumber timed phylogenies in the literature, even among papers published in the last ten years. With only one or a few calibrations, they can become valuable components of the global timetree of life. Even though many species have never been incorporated into a molecular phylogeny, there is often corresponding molecular data deposited in repositories like NCBI GenBank, where DNA sequence information is freely accessible to researchers. These two sources of data represent a fantastic opportunity to build comprehensive timetrees.
Over the course of several publications, we have developed a supertree building approach entailing the assembly of all published, timed phylogenies including species of interest; a search for untimed trees including any remaining species followed by novel timing using literature consensus secondary calibrations; and finally, the assembly of de novo alignments and ultimately timed phylogenies based on publicly available data.
In our most recent effort, this search uncovered enough data to build a new synthetic supertree of 455 primates, 98% of all those present in the NCBI taxonomy, and 55 more than were already present in TimeTree. Our new timetree represents the most complete description of the evolutionary relationships among primates to date.
Completing the TimeTree of Life
This effort has demonstrated that while the evolutionary history of even some of the most charismatic species on Earth has remained incompletely understood, we have the tools to fill much of this gap in knowledge. We envision our research protocol as an accessible and, ultimately, extremely valuable tool in our efforts to understand evolution. Complete timetrees are a foundational resource in many fields, and we have discovered that they can often be built from existing data.
Furthermore, such complete timetrees allow us to test hypotheses we could not otherwise. For example, in the present study, we tested whether the numbers of species in different primate clades could better be explained by unique speciation rates, with some primate lineages generating new species much faster than others, or whether the best explanation was simply time, with all lineages making new species at about the same rate, and older lineages accruing more species over time. What we found was that the major groups of primates did in fact all share relatively similar rates of speciation, and that their age was therefore a better predictor of their species richness. This analysis would be quite problematic if we were missing many species or dates in our timetree, so it serves as a perfect example of the utility of large, complete timetrees.