Short tandem repeats (STRs) are gold-standard genetic loci used for source attribution of evidentiary material in legal matters. Their power lies in their high heterozygosity and large allele spread. Commercially available STR assays leverage linkage disequilibrium (LD)-independent STRs (i.e., alleles at one STR are not correlated with alleles another STR) to calculate the product of allele frequencies in a population. The result of the product rule is a random match probability (RMP) describing the rarity of the observed alleles in an STR profile in the context of some reference population. In collaboration with Dr. Nicole Novroski, our team is addressing two major STR limitations using recent advances in genomics and computational biology. First, the size of STR amplicons prohibits genotyping from degraded substrates such as human remains – we leveraged the LD structure of the genome to predict unobserved forensically relevant STR genotypes from identity-informative single nucleotide polymorphisms (SNPs). We present a bioinformatic workflow for predicting STRs and the resulting population genetic parameters to evaluate reliability of predicted genotypes. Second, mixed STR profiles are a major obstacle to the forensic DNA practitioner due, in part, to the robustness and advances of modern STR assays to detect low level contributors. We simulated 200 populations modeled from four United States population groups using the forensim R package resulting in 155,400 individuals (27 STRs per individual). Across 200 populations, we frequently observed evidence of residual population stratification detected by Hardy-Weinberg equilibrium deviations and substantial allele frequency differences. In 2,400 mixed DNA profiles (2-, 3-, and 4-person) we show that variation in allele frequency across simulated populations significantly altered likelihood ratios (LRs) from 2-person mixtures but did not influence 3- or 4-person mixtures. Our findings demonstrate that studying STRs in a forensic context can be (i) cost-effective for early career researchers, (ii) extremely well powered due to almost no upper limit for sample size, and (iii) provides novel insights into the behavior of STR mixtures.
Learning Objectives:
1. Describe two major limitations of loci used for source attribution (e.g., SNPs and STRs)
2. Compare and contrast the benefits and limitations of simulated data in forensic science research
3. Explain how population genetics concepts apply to forensic DNA casework and research