Performance and characterization of 94 identity-informative SNPs in Northern Han Chinese using ForenSeq ™ DNA signature prep kit

J Forensic Leg Med. 2024 Apr:103:102678. doi: 10.1016/j.jflm.2024.102678. Epub 2024 Mar 21.

Abstract

Target and flanking region (FR) variation at 94 identity-informative SNPs (iSNPs) are investigated in 635 Northern Han Chinese using the ForenSeq DNA Signature Prep Kit on the MiSeq FGx Forensic Genomics System. The dataset presents the following performance characteristics (average values): ≥60% bases with a quality score of 20 or higher (%≥ Q20); >700 × of depth of coverage (DoC) from both Sample Details Reports and Flanking Region Reports; >80% of effective reads; ≥60% of allele coverage ratio (ACR); and ≥70% of inter-locus balance, while some stable low-performance characteristics are also observed: low DoC at rs1736442, rs1031825, rs7041158, rs338882, rs2920816, rs1493232, rs719366, and rs2342747; high noise at rs891700; and imbalanced ACR at rs6955448 and rs338882. The average amplicon length is 69 bp, suitable for detecting degraded samples. Bioinformatic concordance achieves 99.99% between the ForenSeq Universal Analysis Software (UAS) and the Integrative Genomic Viewer (IGV) inspection. Discordance results from flanking region deletions of rs10776839, rs8078417, rs2831700, and rs1454361. Due to FR variants within amplicons detected by massively parallel sequencing (MPS), the increases in the number of unique alleles, effective alleles (Ae), and observed heterozygosity (Hobs) are 46.81%, 4.51%, and 3.29%, respectively. Twelve FR variants are first reported to dbSNP, such as rs1252699848, rs1665500714, rs1771121532, rs2097285015, rs1851671415, rs2045669877, rs2046758811, rs2044248635, rs1251308240, rs1968822112, rs1981638299, and rs1341756746. All 94 iSNPs from target and amplicon data are in Hardy-Weinberg equilibrium (HWE) and independent within autosomes. As expected, forensic parameters from the amplicon data increase significantly on the combined power of discrimination (CPD = 1 - 3.9876 × 10-38) and the combined power of exclusion (CPE = 1 - 6.6690 × 10-8). Additionally, the power of the system effectiveness (CPD = 1 - 6.7054 × 10-72 and CPE = 1 - 4.4719 × 10-20) with sequence-based 27 autosomal STRs and 94 iSNP amplicons in combination is substantially improved compared to one type of marker alone. In conclusion, we have established a traditional length-based and current sequence-based reference database with 58 STRs and 94 iSNPs in the Northern Han Chinese population. We hope these data can serve as a solid reference and foundation for forensic practice.

Keywords: ForenSeq ™ DNA Signature Prep Kit; Identity-informative single nucleotide polymorphism (iSNP); Massively parallel sequencing (MPS); MiSeq FGx ® Forensic Genomics System; Northern Han Chinese; Population genetics.

MeSH terms

  • China
  • DNA Fingerprinting*
  • East Asian People / genetics
  • Ethnicity / genetics
  • Female
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Male
  • Polymorphism, Single Nucleotide*
  • Sequence Analysis, DNA*