It was a landmark event when the human genome was finally sequenced, and the Human Genome Project came to an end. But like most things, the reality is more complicated. There are long stretches of repeat sequences in the human genome that are extremely challenging to decipher, in part because of the tools we use to sequence DNA. So it took many more years to generate a truly complete sequence that included every last base pair of the human genome. The other problem with that reference sequence is that originally, it was made from a collection of individuals, in some cases taking stretches of sequence from one individual and combing it with others to make a composite map.
Over many years, scientists led sequencing efforts that showed how various individual human genomes were different from the original reference created by the Genome Project. However, many of these studies have identified very small changes like single nucleotide polymorphisms, and not large structural variations.
Researchers have now been able to address this issue in an effort to capture the true complexity of the human genome, and how the genome can vary from one person to another. Reporting in Science, a team of researchers has generated 64 full reference human genome sequences. This work will enable investigators to get a better look at how genetic variants, and how genetic features carried in populations can contribute to disease.
Since genetic sequencing technologies have improved, scientists have been able to find large variations in the structure of DNA, including the insertion of long sequences. It's thought that structural variations contribute more to disruptions in gene function than small ones.
The new dataset that was created from 64 assembled human genomes is meant to represent 25 different human populations from around the world. Each of these genomes was assembled without using pieces from the original composite, and the authors suggested that it will be much better at illustrating how the genome may differ between populations.
"We've entered a new era in genomics where whole human genomes can be sequenced with exciting new technologies that provide more substantial and accurate reads of the DNA bases," said study co-author Scott Devine, Ph.D., Associate Professor of Medicine at UMSOM. "This is allowing researchers to study areas of the genome that previously were not accessible but are relevant to human traits and diseases."
The work also demonstrated that this approach can reveal more novel structural variants that are longer than 50 base pairs in length than previous methods, which have not been particularly precise. Now, researchers can genotype structural variants more accurately.
"The landmark new research demonstrates a giant step forward in our understanding of the underpinnings of genetically-driven health conditions," said E. Albert Reece, M.D., Ph.D., M.B.A., the John Z. and Akiko K. Bowers Distinguished Professor and Dean of the University of Maryland School of Medicine. "This advance will hopefully fuel future studies aimed at understanding the impact of human genome variation on human diseases."
Sources: AAAS/Eurekalert! via University of Maryland School of Medicine, Science