metaBIT, an integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data

Mol Ecol Resour. 2016 Nov;16(6):1415-1427. doi: 10.1111/1755-0998.12546. Epub 2016 Jun 22.

Abstract

Micro-organisms account for most of the Earth's biodiversity and yet remain largely unknown. The complexity and diversity of microbial communities present in clinical and environmental samples can now be robustly investigated in record times and prices thanks to recent advances in high-throughput DNA sequencing (HTS). Here, we develop metaBIT, an open-source computational pipeline automatizing routine microbial profiling of shotgun HTS data. Customizable by the user at different stringency levels, it performs robust taxonomy-based assignment and relative abundance calculation of microbial taxa, as well as cross-sample statistical analyses of microbial diversity distributions. We demonstrate the versatility of metaBIT within a range of published HTS data sets sampled from the environment (soil and seawater) and the human body (skin and gut), but also from archaeological specimens. We present the diversity of outputs provided by the pipeline for the visualization of microbial profiles (barplots, heatmaps) and for their characterization and comparison (diversity indices, hierarchical clustering and principal coordinates analyses). We show that metaBIT allows an automatic, fast and user-friendly profiling of the microbial DNA present in HTS shotgun data sets. The applications of metaBIT are vast, from monitoring of laboratory errors and contaminations, to the reconstruction of past and present microbiota, and the detection of candidate species, including pathogens.

Keywords: ancient DNA; metagenomics; microbial profiling; microbiome; shotgun sequencing.

Publication types

  • Evaluation Study

MeSH terms

  • Automation
  • Computational Biology / methods*
  • Environmental Microbiology*
  • High-Throughput Nucleotide Sequencing / methods*
  • Metagenomics / methods*
  • Microbiota*
  • Sequence Analysis, DNA / methods*