A Continuous Statistical Phasing Framework for the Analysis of Forensic Mitochondrial DNA Mixtures

Genes (Basel). 2021 Jan 20;12(2):128. doi: 10.3390/genes12020128.

Abstract

Despite the benefits of quantitative data generated by massively parallel sequencing, resolving mitotypes from mixtures occurring in certain ratios remains challenging. In this study, a bioinformatic mixture deconvolution method centered on population-based phasing was developed and validated. The method was first tested on 270 in silico two-person mixtures varying in mixture proportions. An assortment of external reference panels containing information on haplotypic variation (from similar and different haplogroups) was leveraged to assess the effect of panel composition on phasing accuracy. Building on these simulations, mitochondrial genomes from the Human Mitochondrial DataBase were sourced to populate the panels and key parameter values were identified by deconvolving an additional 7290 in silico two-person mixtures. Finally, employing an optimized reference panel and phasing parameters, the approach was validated with in vitro two-person mixtures with differing proportions. Deconvolution was most accurate when the haplotypes in the mixture were similar to haplotypes present in the reference panel and when the mixture ratios were neither highly imbalanced nor subequal (e.g., 4:1). Overall, errors in haplotype estimation were largely bounded by the accuracy of the mixture's genotype results. The proposed framework is the first available approach that automates the reconstruction of complete individual mitotypes from mixtures, even in ratios that have traditionally been considered problematic.

Keywords: Bayesian inference; DEploid; Ion Torrent; R; bioinformatics; computational phasing; forensic genetics; massively parallel sequencing; mtDNA mixture deconvolution; population genomics.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Bayes Theorem
  • Computational Biology / methods
  • DNA, Mitochondrial*
  • Forensic Genetics / methods*
  • Genome, Mitochondrial
  • Genomics / methods
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Models, Statistical*
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods

Substances

  • DNA, Mitochondrial