Biogeographical ancestry, variable selection, and PLS-DA method: a new panel to assess ancestry in forensic samples via MPS technology

Forensic Sci Int Genet. 2023 Jan:62:102806. doi: 10.1016/j.fsigen.2022.102806. Epub 2022 Nov 12.

Abstract

As evidenced by the large number of articles recently published in the literature, forensic scientists are making great efforts to infer externally visible features and biogeographical ancestry (BGA) from DNA analysis. Just as phenotypic, ancestry information obtained from DNA can provide investigative leads to identify the victims (missing/unidentified persons, crime/armed conflict/mass disaster victims) or trace their perpetrators when no matches were found with the reference profile or in the database. Recently, the advent of Massively Parallel Sequencing technologies associated with the possibility of harnessing high-throughput genetic data allowed us to investigate the associations between phenotypic and genomic variations in worldwide human populations and develop new BGA forensic tools capable of simultaneously analyzing up to millions of markers if for example the ancient DNA approach of hybridization capture was adopted to target SNPs of interest. In the present study, a selection of more than 3000 SNPs was performed to create a new BGA panel and the accuracy of the new panel to infer ancestry from unknown samples was evaluated by the PLS-DA method. Subsequently, the panel created was assessed using three variable selection techniques (Backward variable elimination, Genetic Algorithm and Regularized elimination procedure), and the best SNPs in terms of inferring bio-geographical ancestry at inter- and intra-continental level were selected to obtain panels to predict BGA with a reduced number of selected markers to be applied in routine forensic cases where PCR amplification is the best choice to target SNPs.

Keywords: Biogeographical ancestry; Features selection; Forensic samples; Machine learning; SNPs; Victim’s identification.

MeSH terms

  • DNA / genetics
  • Forensic Genetics* / methods
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Least-Squares Analysis
  • Phylogeography
  • Polymerase Chain Reaction
  • Polymorphism, Single Nucleotide
  • Population Groups* / genetics

Substances

  • DNA