RiboTaxa: combined approaches for rRNA genes taxonomic resolution down to the species level from metagenomics data revealing novelties

NAR Genom Bioinform. 2022 Sep 21;4(3):lqac070. doi: 10.1093/nargab/lqac070. eCollection 2022 Sep.

Abstract

Metagenomic classifiers are widely used for the taxonomic profiling of metagenomics data and estimation of taxa relative abundance. Small subunit rRNA genes are a gold standard for phylogenetic resolution of microbiota, although the power of this marker comes down to its use as full-length. We aimed at identifying the tools that can efficiently lead to taxonomic resolution down to the species level. To reach this goal, we benchmarked the performance and accuracy of rRNA-specialized versus general-purpose read mappers, reference-targeted assemblers and taxonomic classifiers. We then compiled the best tools (BBTools, FastQC, SortMeRNA, MetaRib, EMIRGE, VSEARCH, BBMap and QIIME 2's Sklearn classifier) to build a pipeline called RiboTaxa. Using metagenomics datasets, RiboTaxa gave the best results compared to other tools (i.e. Kraken2, Centrifuge, METAXA2, phyloFlash, SPINGO, BLCA, MEGAN) with precise taxonomic identification and relative abundance description without false positive detection (F-measure of 100% and 83.7% at genus level and species level, respectively). Using real datasets from various environments (i.e. ocean, soil, human gut) and from different approaches (e.g. metagenomics and gene capture by hybridization), RiboTaxa revealed microbial novelties not discerned by current bioinformatics analysis opening new biological perspectives in human and environmental health.