Developmental validation of PACE™: Automated artifact identification and contributor estimation for use with GlobalFiler™ and PowerPlex® fusion 6c generated data

Forensic Sci Int Genet. 2019 Nov:43:102140. doi: 10.1016/j.fsigen.2019.102140. Epub 2019 Aug 8.

Abstract

DNA mixture interpretation remains one of the major challenges in forensic DNA analysis. DNA mixture samples are inherently complex due to several factors including the variations in the quantity of DNA, the presence of non-allelic artifactual peaks and the presence of multiple contributors with variable levels of allele sharing. The Probabilistic Assessment for Contributor Estimation (PACE) is a fully continuous probabilistic machine learning-based method to predict the number of contributors (n) in a sample, and was previously developed for use with the Identifiler amplification kit. This system required manual preprocessing of data and was limited, exclusively, to samples amplified using said kit. This study introduces PACE™ v1.3.7 for use with both the GlobalFiler and PowerPlex Fusion 6c amplification kits. An automated artifact identification and management system has been added to accompany the rapid estimation of the number of donors in a given mixture. The artifact management module, when evaluated using previously unseen data, identified true allelic peaks and removed artifacts such as elevated baseline noise, stutter, and pull-up with accuracy over 93.5%. The systems yield the correct n classifications in over 90% of the samples, and demonstrate consistent accuracies as the number of donors and the overall mixture complexity increase. Misclassified samples generally exhibited high levels of allele sharing among donors, low DNA template amounts and high incidence of allelic dropout. This system offers a means for both artifact management and n estimation as well as a quantitative and reproducible method of assessing the quality of a profile.

Keywords: DNA mixture; artifact identification; complex interpretation; machine learning; number of contributors; random forest.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Alleles
  • Artifacts*
  • DNA / genetics*
  • DNA Fingerprinting / methods*
  • Humans
  • Machine Learning*
  • Models, Statistical
  • Polymerase Chain Reaction

Substances

  • DNA