Development of the decision tree model for distinguishing individuals of Chinese four surnames from Zhanjiang Han population based on Y-STR haplotypes

Leg Med (Tokyo). 2021 Mar:49:101848. doi: 10.1016/j.legalmed.2021.101848. Epub 2021 Jan 15.

Abstract

Co-separation studies between surnames and Y chromosome genetic markers are beneficial to revealing population migrations, surname origins, population formation histories and forensic familial searching. Genetic distributions of 27 Y-STRs in Chinese four surnames (Li, Lin, Chen and Huang) from Zhanjiang Han population were investigated. Meanwhile, we tried to develop a decision tree model for surname predictions based on Y-STR haplotypes. Allelic frequencies of 27 Y-STRs showed that unique alleles were only observed in a certain surname; besides, some alleles displayed higher frequencies in a certain surname than those in other surnames, implying these alleles might be employed as the useful indicators for surname predictions. Haplotype match probability values of 27 Y-STRs in these surnames revealed that the system could be used as a valuable tool for forensic male identification. The developed decision tree model performed well for the training set with the accuracy of 0.9860 and obtained the relatively high accuracy (>0.70) for surname predictions of the testing set. To sum up, we explored the power of the machine learning to the surname predictions based on obtained Y-STR haplotypes, which showed promising application values in forensic familial searching.

Keywords: Forensic familial searching; Surname inferences; The decision tree; Y-STRs; Zhanjiang Han.

MeSH terms

  • Asian People / genetics*
  • China
  • Chromosomes, Human, Y / genetics*
  • Decision Trees*
  • Forensic Genetics / methods*
  • Gene Frequency / genetics
  • Genetic Markers / genetics*
  • Genetics, Population / methods*
  • Haplotypes / genetics*
  • Humans
  • Male
  • Microsatellite Repeats / genetics*
  • Names*
  • Pedigree

Substances

  • Genetic Markers