Optimizing body fluid recognition from microbial taxonomic profiles

Forensic Sci Int Genet. 2018 Nov:37:13-20. doi: 10.1016/j.fsigen.2018.07.012. Epub 2018 Jul 30.

Abstract

In forensics the DNA-profile is used to identify the person who left a biological trace, but information on body fluid can also be essential in the evidence evaluation process. Microbial composition data could potentially be used for body fluid recognition as an improved alternative to the currently used presumptive tests. We have developed a customized workflow for interpretation of bacterial 16S sequence data based on a model composed of Partial Least Squares (PLS) in combination with Linear Discriminant Analysis (LDA). Large data sets from the Human Microbiome Project (HMP) and the American Gut Project (AGP) were used to test different settings in order to optimize performance. From the initial cross-validation of body fluid recognition within the HMP data, the optimal overall accuracy was close to 98%. Sensitivity values for the fecal and oral samples were ≥0.99, followed by the vaginal samples with 0.98 and the skin and nasal samples with 0.96 and 0.81 respectively. Specificity values were high for all 5 categories, mostly >0.99. This optimal performance was achieved by using the following settings: Taxonomic profiles based on operational taxonomic units (OTUs) with 0.98 identity (OTU98), Aitchisons simplex transform with C = 1 pseudo-count and no regularization (r = 1) in the PLS step. Variable selection did not improve the performance further. To test for robustness across sequencing platforms, we also trained the classifier on HMP data and tested on the AGP data set. In this case, the standard OTU based approach showed moderately decline in accuracy. However, by using taxonomic profiles made by direct assignment of reads to a genus, we were able to nearly maintain the high accuracy levels. The optimal combination of settings was still used, except the taxonomic level being genus instead of OTU98. The performance may be improved even further by using higher resolution taxonomic bins.

Keywords: Discriminants; Forensics; Massive parallel sequencing; Microbiome; PLS.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / genetics*
  • Discriminant Analysis
  • Feces / microbiology*
  • Female
  • Forensic Genetics / methods
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Least-Squares Analysis
  • Microbiota
  • Mouth / microbiology*
  • Nasal Cavity / microbiology*
  • RNA, Ribosomal, 16S*
  • Sensitivity and Specificity
  • Sequence Analysis, DNA
  • Skin / microbiology*
  • Vagina / microbiology*

Substances

  • RNA, Ribosomal, 16S