An overview of SNP-SNP microhaplotypes in the 26 populations of the 1000 Genomes Project

Int J Legal Med. 2022 Sep;136(5):1211-1226. doi: 10.1007/s00414-022-02820-2. Epub 2022 Apr 9.

Abstract

Microhaplotypes (MHs) are a promising new type of forensic markers that are defined by the combinations of two- or more single-nucleotide polymorphisms (SNPs) within 200 bp. Their advantages, such as low mutation rates, lack of stutter artifacts, and short amplicons, have improved human identification, kinship analysis, ancestry prediction, and mixture deconvolution capabilities. Information on published MHs, e.g., allele frequencies, is available in widely used public databases, ALlele FREquency Database, and MicroHapDB. However, there are abundant non-published MHs spread over the whole genome, and those databases do not incorporate other databases (e.g., the SNP Database) to provide users with more integrated information. Therefore, it is essential to establish a robust, responsive, and comprehensive MHs database. In this study, we thoroughly screened for SNP-SNP MHs among 26 populations from the 1000 Genomes Project (Phase 3). All genotype data of SNPs in each MH were converted to PHASE input files, and allele frequencies were estimated using PHASE. We compiled a detailed summary of SNP-SNPs at the global, continental, and population levels focused on haplotypes and the Ae value and supplemented our database using dbSNP data (last updated in 2015). We have successfully established a dual-SNP MH database (D-SNPsDB) of MHs within 50 bp for 26 populations in the integration of basic data such as physical positions in the human genome, mapping of variant identifiers (rsIDs), allele frequencies, and basic variant information. For public database queries, the D-SNPsDB web app was developed with the R Shiny package to get integrated information.

Keywords: Bioinformatics; Database; Forensic genetics; Microhaplotype.

MeSH terms

  • Gene Frequency
  • Genotype
  • Haplotypes
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Polymorphism, Single Nucleotide*