Genome-wide association studies combined with k-fold cross-validation identify rs17822931 as an ancestry-informative marker in Han Chinese population

Electrophoresis. 2023 Aug;44(15-16):1187-1196. doi: 10.1002/elps.202200227. Epub 2023 May 15.

Abstract

DNA-based ancestry inference has long been a research hot spot in forensic science. The differentiation of Han Chinese population, such as the northern-to-southern substructure, would benefit forensic practice. In the present study, we enrolled participants from northern and southern China, each participant was genotyped at ∼400 K single-nucleotide polymorphisms (SNPs) and data of CHB and CHS from 1000 Genomes Project were used to perform genome-wide association analyses. Meanwhile, a new method combining genome-wide association study (GWAS) analyses with k-fold cross-validation in a small sample size was introduced. As a result, one SNP rs17822931 emerged with a p-value of 7.51E - 6. We also simulated a huge dataset to verify whether k-fold cross-validation could reduce the false-negative rate of GWAS. The identified ABCC11 rs17822931 has been reported to have allele frequencies varied with the geographical gradient distribution in humans. We also found a great difference in the allele frequency distributions of rs17822931 among five different cohorts of the Chinese population. In conclusion, our study demonstrated that even small-scale GWAS can also have potential to identify effective loci with implemented k-fold cross-validation method and shed light on the potential maker of rs17822931 in differentiating the north-to-south substructure of the Han Chinese population.

Keywords: GWAS; SNP; forensic genetics; population structure.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China
  • East Asian People* / genetics
  • Gene Frequency
  • Genetics, Population*
  • Genome-Wide Association Study*
  • Genotype
  • Humans
  • Polymorphism, Single Nucleotide