An advanced genomic analysis of a multigenerational family is providing new knowledge about genetic mutations and their transmission, both the variants that are inherited and those that arise anew.
The findings are published today, April 23, in Nature.
"We sequenced and assembled the chromosomes of multiple members of a large, four-generation family to understand how the genetic information changed from generation to generation," said Evan E Eichler, professor of genome sciences at the University of Washington School of Medicine in Seattle and the corresponding author of the paper.
During the study, lead author David Porubsky was a postdoctoral fellow at the UW. Porubsky is now at the European Molecular Biology Laboratories in Germany and continues to work with the Eichler lab. Several scientists at the University of Utah and at PacBio also helped lead the project.
The DNA for most of the family was extracted from peripheral whole blood leukocytes and was made available both as primary material and cell lines. However, because members of the great-grandparent generation are no longer living, their DNA was only available as cell lines.
The use of several sequencing technologies and the nearly complete end-to-end assemblies of genomes from this family created one of the most comprehensive, publicly available 'truth sets' of all classes of genome variants, according to the project report.
The researchers wanted to explore some of the most mutable regions of the genome. They also studied where changes occur with respect to recombination events when chromosomes exchange DNA code or reshuffle their own code.
The researchers explained the importance of examining how newly arisen (or de novo) mutations occur and of learning how they might be passed along to succeeding generations. De novo mutations are key to understanding genetic disease, variations in human characteristics, and human evolution. De novo mutations can also introduce new traits into a family, such as a child's eye color not seen in previous generations.
In their latest study of human genetic variations and de novo mutation patterns, the scientists generated reference genomes for 28 family members representing four generations – great-grandparents, grandparents, parents, and children. The anonymous family began participating in genetics studies more than three decades ago. Their data has been extensively used in research, and scientists continue to recruit other members of this family.
The researchers employed five different state-of-the-art sequencing technologies and several advances in chromosome assembly algorithms to build this multigenerational family genome resource. For example, they were able to generate both short reads and long reads through their DNA sequencing. Combining sequencing technologies helps avoid inaccuracies and difficulties in interpreting information that can occur when piecing together only short segments of genetic code, which is like tackling a jigsaw puzzle composed of tiny parts instead of larger ones.
Previous attempts to examine the rates of de novo mutations in humans based just on short reads excluded the most repetitive – and most mutation-prone – regions of the human genome.
The combination of technologies increased the researchers' ability to evaluate all types of variations while analyzing the family's DNA codes. These variations included deletions, insertions, inversions, and tandem repeats, which are repetitions that occur next to each other. Moreover, the approach gave researchers access not previously available to some of the more complex areas of the human genome, including the Y, or male, chromosome. These hard-to-explore regions are significant locations for mutational changes.
"What was most exciting to us was the finding that the rate of new mutations varied by over twenty-fold depending on where you were in the genome," said Eichler.
The scientists noted that most previous studies of new mutations depended solely on short-read DNA sequencing from trios to compare father, mother and child.
Having multiple generations in this latest study enabled researchers to confirm the transmission of mutations in the family. Working in the other direction, the scientists could more effectively trace back a mutation in an offspring to determine whether and where it was present in previous generations.
Beyond that, the multigenerational resource and the modern technologies allowed the researchers to distinguish between mutations that came from the parents and those that arose in their offspring after they were conceived. At that point in human reproduction, the fertilized egg becomes a single cell called a zygote, ready to begin dividing to form into an embryo. Although referred to as postzygotic mutations, they can occur any time after fertilization: during embryonic development and throughout the lifespan. In fact, some postzygotic changes in the genomes of some of human cells are thought to be linked to aging.
The researchers discovered that the rate of de novo mutations was higher than previously estimated for this family. They now estimate about 98 to 206 de novo mutations per generation.
They found that approximately 16% of the de novo mutations were postzygotic and evenly split between mutations in genes from the mother and genes from the father. In contrast, more than 81% of the de novo mutations that occurred before fertilization came from the father. Paternal age was a significant factor in those mutations, which are more common in the offspring of older fathers. But paternal age seemed to have no effect on the postzygotic mutations.
Recombination events were more likely to originate through exchanges or reshuffling of genes on the mother's chromosomes. Paternal recombination events, when they did occur, tended to happen towards the ends of the chromosomes. Although this was only one family, the scientists were surprised to find that fewer recombination events occurred as the parents aged.
The scientists determined that the de novo mutation rate varied across different regions of the genome. Areas that had more repeat content were more likely to generate de novo mutations. These included centromeres (regions on the chromosome that control cell division) and segmental duplications (blocks of DNA that occur in more than one region of the genome). They observed that the tightly packed regions of the Y chromosome had some of the highest rates of de novo mutations.
Tandem repeats were some of the hottest regions for de novo mutations. In the family being studied, there were 32 of these hotspots where recurrent mutations occurred, including 16 that expanded or contracted three or more times.
The researchers think that this genome resource created from the mutigenerational family pedigree will be useful for future genetic studies, such as benchmarking new sequence technologies or studying even more complex forms of variation and how they are passed down or changed from one generation to the next.
However, they pointed out several limitations to this resource, such as the regions of the short arms five genomes (13, 14, 15, 21, and 22) that have not yet been fully resolved, largely because of their highly repetitive features and assembly difficulties. In addition, the patterns of de novo mutations are likely to differ among families. The researchers said that many more multigenerational human family genome resources would need to be developed and made widely available to scientists to better establish ranges of de novo mutation rates.
Such multigenerational genome resources would further efforts, they said, to test new ways to detect genetic variants and to evaluate new sequencing technologies.
The scientist also believed that they were conservative in their estimates of mutation rates.
"More genomic variation, including de novo variation, remains to be discovered," they noted.
Porubsky and Eichler wrote that they were amazed by the dedication to science of the multigenerational family whose genomes the research team studied. For example, one of the individuals in the family (whose genome assembly is designated NA12878) has perhaps the most studied human genome in history. The researchers expressed thanks to all the members of the family for their long-standing commitment to furthering knowledge about human genetic variation and for making the data publicly available.