The selection of software and database for metagenomics sequence analysis impacts the outcome of microbial profiling and pathogen detection

PLoS One. 2023 Apr 7;18(4):e0284031. doi: 10.1371/journal.pone.0284031. eCollection 2023.

Abstract

Shotgun metagenomic sequencing analysis is widely used for microbial profiling of biological specimens and pathogen detection. However, very little is known about the technical biases caused by the choice of analysis software and databases on the biological specimen. In this study, we evaluated different direct read shotgun metagenomics taxonomic profiling software to characterize the microbial compositions of simulated mice gut microbiome samples and of biological samples collected from wild rodents across multiple taxonomic levels. Using ten of the most widely used metagenomics software and four different databases, we demonstrated that obtaining an accurate species-level microbial profile using the current direct read metagenomics profiling software is still a challenging task. We also showed that the discrepancies in results when different databases and software were used could lead to significant variations in the distinct microbial taxa classified, in the characterizations of the microbial communities, and in the differentially abundant taxa identified. Differences in database contents and read profiling algorithms are the main contributors for these discrepancies. The inclusion of host genomes and of genomes of the interested taxa in the databases is important for increasing the accuracy of profiling. Our analysis also showed that software included in this study differed in their ability to detect the presence of Leptospira, a major zoonotic pathogen of one health importance, especially at the species level resolution. We concluded that using different databases and software combinations can result in confounding biological conclusions in microbial profiling. Our study warrants that software and database selection must be based on the purpose of the study.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Metagenome
  • Metagenomics* / methods
  • Mice
  • Microbiota* / genetics
  • Sequence Analysis, DNA / methods
  • Software

Grants and funding

The sequence analysis work was supported by the National Science Foundation under Grant No. DGE-1545433 to R.X. and startup funds to L.C.M.S. from the University of Georgia Office of Research. The sample collection, sequencing and analysis was done during S.R.’s tenure at the Ross University School of Veterinary Medicine, Saint Kitts and it was supported by internal grants from the Center for One Health and Tropical Medicine.