Over 40 tandem repeats undergo expansion events that lead to neurological disease. This number is likely an underestimate as many repeats are difficult to amplify using existing short read sequencing approaches. Through the use of long-read sequencing, we identified a 69 bp intronic variable number tandem repeat (VNTR) in WD repeat domain 7 (WDR7). The VNTR is expanded in a family with Amyotrophic Lateral Sclerosis (ALS) and exhibits significantly higher repeat copy number in three independent cohorts of individuals with sporadic ALS compared with matched controls, suggesting that it plays a role in modifying susceptibility to ALS. Each 69 bp repeat motif forms a stem-loop structure with the potential to produce microRNAs, and the repeat RNA can aggregate when expressed in cells – a common hallmark of neurodegenerative disease pathology. We performed multiplexed barcoded PacBio long-read sequencing to resolve the complete internal structure of the WDR7 repeat in 288 geographically diverse individuals. We found striking variability in both repeat length and internal nucleotide composition. Some of the 69 bp repeat motifs are specifically present or absent in certain geographic populations, and we identified common repeat motifs in both Denisovan and Neanderthal genomes. We found that the repeat expands in the 3′-5′ direction, in groups of two repeat units. Extending this analysis to characterize VNTRs genome-wide, we identified several VNTRs that can differ in length (by up to 20 kb) amongst individuals and geographical super-populations and repeats that exhibit substantial diversity in internal sequence composition of the repeat. By characterizing the WDR7 VNTR in ALS, we identify features associated with repeat expansion dynamics, the mechanistic consequences of repeat expansions to ALS susceptibility, and the structure of repeats in geographically diverse populations that can precipitate expansion events.
Learning Objectives:
1. Understand the contribution of genetic risk factors to Amyotrophic Lateral Sclerosis
2. Describe novel methods to identify genetic susceptibility factors in neurodegenerative disease
3. Discuss how resolving the complete sequence of tandem repeats can help us understand patterns of repeat expansions throughout human history