Phylogenetic analyses of 41 Y-STRs and machine learning-based haplogroup prediction in the Qingdao Han population from Shandong province, Eastern China

Ann Hum Biol. 2023 Feb;50(1):35-41. doi: 10.1080/03014460.2023.2168057.

Abstract

Background: Known for its rich history and culture, Qingdao is a typical symbol of Chinese maritime culture. Its unique genetic landscape has aroused interest among geneticists and forensic scientists. However, the genetic landscape of Qingdao has never been uncovered.

Aim: This investigation intends to provide light on Qingdao's paternal genetic diversity and its evolutionary connections to other Han subgroups.

Subjects and methods: The genetic polymorphisms of 41 Y-chromosomal short tandem repeat (STR) loci in the Qingdao Han were investigated using SureID® PathFinder Plus Kit. Phylogenetic studies were performed using genotype data from 52 East Asian groups at 23 common Y-STR loci. A multidimensional scaling plot and cladogram were constructed. Linear Discriminant Analysis (LDA) was carried out for predicting categories among the Han people. The k-nearest neighbour (kNN) algorithm was utilised to designate Y-SNP haplogroups for each haplotype.

Results: The Qingdao Han were genetically far from the Tibeto-Burman populations and close with the Han people from northern China. LDA indicated a deep integration among the present-day Han people. By the kNN model, the predicted O2a2 and O2a1 were shown to be the predominant Y-SNP haplogroups.

Conclusions: This study would be helpful for reconstructing the patrilineal history in China and establishing a more comprehensive Y-STR database.

Keywords: Qingdao Han; Y-STR; machine learning; patrilineal history; phylogeny.

MeSH terms

  • China
  • Chromosomes, Human, Y / genetics
  • Ethnicity* / genetics
  • Gene Frequency
  • Genetics, Population*
  • Haplotypes
  • Humans
  • Microsatellite Repeats / genetics
  • Phylogeny