A single nucleotide polymorphism panel for individual identification and ancestry assignment in Caucasians and four East and Southeast Asian populations using a machine learning classifier

Forensic Sci Med Pathol. 2019 Mar;15(1):67-74. doi: 10.1007/s12024-018-0071-y. Epub 2019 Jan 16.

Abstract

Single nucleotide polymorphism (SNP) profiling is an effective means of individual identification and ancestry inferences in forensic genetics. This study established a SNP panel for the simultaneous individual identification and ancestry assignment of Caucasian and four East and Southeast Asian populations. We analyzed 220 SNPs (125 autosomal, 17 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs) of the DNA samples from 563 unrelated individuals of five populations (89 Caucasian, 234 Taiwanese Han, 90 Filipino, 79 Indonesian and 71 Vietnamese) and 18 degraded DNA samples. Informativeness for assignment (In) was used to select ancestry informative SNPs (AISNPs). A machine learning classifier, support vector machine (SVM), was used for ancestry assignment. Of the 220 SNPs, 62 were individual identification SNPs (IISNPs) (51 autosomal and 11 X-chromosomal SNPs) and 191 were AISNPs (100 autosomal, 13 X-chromosomal, 30 Y-chromosomal, and 48 mitochondrial SNPs). The 51 autosomal IISNPs offered cumulative random match probabilities (cRMPs) ranging from 1.56 × 10-21 to 3.16 × 10-22 among these five populations. Using AISNPs with the SVM, the overall accuracy rate of ancestry inference achieved in the testing dataset between Caucasian, Taiwanese Han, and Filipino populations was 88.9%, whereas it was 70.0% between Caucasians and each of the four East and Southeast Asian populations. For the 18 degraded DNA samples with incomplete profiling, the accuracy rate of ancestry assignment was 94.4%. We have developed a 220-SNP panel for simultaneous individual identification and ethnic origin differentiation between Caucasian and the four East and Southeast Asian populations. This SNP panel may assist with DNA analysis of forensic casework.

Keywords: Ancestry assignment; Array; Individual identification; Machine learning classifier; Single nucleotide polymorphism; Support vector machine.

MeSH terms

  • Asia
  • Asian People / genetics*
  • Chromosomes, Human, X
  • Chromosomes, Human, Y
  • DNA Degradation, Necrotic
  • DNA Fingerprinting / methods*
  • DNA, Mitochondrial
  • Female
  • Gene Frequency
  • Genetics, Population*
  • Genotype
  • Humans
  • Machine Learning*
  • Male
  • Polymorphism, Single Nucleotide*
  • Retrospective Studies
  • Support Vector Machine
  • White People / genetics

Substances

  • DNA, Mitochondrial