CAMITAX: Taxon labels for microbial genomes

Gigascience. 2020 Jan 1;9(1):giz154. doi: 10.1093/gigascience/giz154.

Abstract

Background: The number of microbial genome sequences is increasing exponentially, especially thanks to recent advances in recovering complete or near-complete genomes from metagenomes and single cells. Assigning reliable taxon labels to genomes is key and often a prerequisite for downstream analyses.

Findings: We introduce CAMITAX, a scalable and reproducible workflow for the taxonomic labelling of microbial genomes recovered from isolates, single cells, and metagenomes. CAMITAX combines genome distance-, 16S ribosomal RNA gene-, and gene homology-based taxonomic assignments with phylogenetic placement. It uses Nextflow to orchestrate reference databases and software containers and thus combines ease of installation and use with computational reproducibility. We evaluated the method on several hundred metagenome-assembled genomes with high-quality taxonomic annotations from the TARA Oceans project, and we show that the ensemble classification method in CAMITAX improved on all individual methods across tested ranks.

Conclusions: While we initially developed CAMITAX to aid the Critical Assessment of Metagenome Interpretation (CAMI) initiative, it evolved into a comprehensive software package to reliably assign taxon labels to microbial genomes. CAMITAX is available under Apache License 2.0 at https://github.com/CAMI-challenge/CAMITAX.

Keywords: CAMI; Docker; Genome Taxonomy; Nextflow; Phylogenetic Placement; Reproducible Research.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • DNA Barcoding, Taxonomic / methods*
  • Databases, Genetic
  • Genome, Microbial*
  • Metagenome*
  • Metagenomics / methods*
  • Phylogeny
  • RNA, Ribosomal, 16S / genetics

Substances

  • RNA, Ribosomal, 16S