We learn that we inherit two copies of every gene, one from each of our parents, but the story is a bit more complex. Some parts of the genome may be duplicated, or depeted, in some people. The number of copies of certain portions of the genome that any person carries can vary, and copy number variants (CNVs) are defined as regions of the genome that are more than 1,000 base pairs but fewer than 5 mega bases in length. They may be regions that are deleted, duplicated, or repeated, and they can change the structure of the genome. It's been estimated that there are hundreds of thousands of CNVs in the typical human genome, and there are genomic hotspots for CNVs. They are thought to account for a significant amount of variability from one person to another, and it's been suggested that they've played an important role in evolution. CNVs can also disrupt portions of the genome in ways that lead to disease.
Scientists have now created a computational tool that identified 15 million CNVs in human genomic data in the UK Biobank. The researchers looked for links between human traits and CNVs, and have revealed a variety of new connections. The findings have been reported in Cell.
A lot of human genetic research has focused on very small changes, or single nucleotide polymorphisms (SNPs), in the sequence of the coding portion of the genome. For this study, the researchers wanted to improve the detection of CNVs, since most tools aim to assess SNPs, noted first study author and postdoctoral researcher Margaux Hujoel.
They leveraged the data in the UK Biobank, focusing on people who were distantly related through shared haplotypes, larger sections of DNA that contain many SNPs, and have been inherited together. Their approach revealed six times more CNVs than previous studies have identified. The CNVs found in this study accounted for half of the inactivations of genes that have been associated with structural genomic changes.
The researchers analyzed links between those CNVs and 56 traits, such as blood pressure, bone mineral density, indicators of lung function, and blood cell counts. Over 250 associations between phenotype and copy number variation were identified. New ties were revealed between certain genes and various traits, like height. For example, some carriers of very rare CNVs impacting a gene called UHRF2 were an average of about seven centimeters shorter compared to people who did not carry the CNV. Some other rare variants had strong effects that were linked to disease.
"This tool should be readily applicable for conducting the same sort of analysis in other ancestry groups, which could turn up quite different and interesting genetic associations," said senior study author Po-Ru Loh, an assistant professor at Brigham and Women's Hospital and Harvard Medical School.
The researchers also noted that most CNVs haven't been discovered yet, even those in the UK Biobank. Large biobanks have generated SNP data by focusing on certain places in the genome, so most CNVs have not been found. The method used in this study is now being adapted so that it can be applied to the protein-coding portion of the genome, known as the exome. Other researchers might eventually use it to analyze the whole genome.
"We view our work as both a methodology that hopefully will continue to be useful and adaptable to other sources of data, and also as more motivation for people to continue delving into the ways that structural variation shapes human traits." Hujoel said.
Sources: Broad Institute of MIT and Harvard, Cell