MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data

Bioinformatics. 2016 Sep 15;32(18):2760-7. doi: 10.1093/bioinformatics/btw312. Epub 2016 Jun 3.

Abstract

Motivation: High-throughput metagenomic sequencing has revolutionized our view on the structure and metabolic potential of microbial communities. However, analysis of metagenomic composition is often complicated by the high complexity of the community and the lack of related reference genomic sequences. As a start point for comparative metagenomic analysis, the researchers require efficient means for assessing pairwise similarity of the metagenomes (beta-diversity). A number of approaches were used to address this task, however, most of them have inherent disadvantages that limit their scope of applicability. For instance, the reference-based methods poorly perform on metagenomes from previously unstudied niches, while composition-based methods appear to be too abstract for straightforward interpretation and do not allow to identify the differentially abundant features.

Results: We developed MetaFast, an approach that allows to represent a shotgun metagenome from an arbitrary environment as a modified de Bruijn graph consisting of simplified components. For multiple metagenomes, the resulting representation is used to obtain a pairwise similarity matrix. The dimensional structure of the metagenomic components preserved in our algorithm reflects the inherent subspecies-level diversity of microbiota. The method is computationally efficient and especially promising for an analysis of metagenomes from novel environmental niches.

Availability and implementation: Source code and binaries are freely available for download at https://github.com/ctlab/metafast The code is written in Java and is platform independent (tested on Linux and Windows x86_64).

Contact: ulyantsev@rain.ifmo.ru

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Databases, Genetic
  • Metagenome
  • Metagenomics*
  • Microbiota