Pseudoalignment for metagenomic read assignment

Bioinformatics. 2017 Jul 15;33(14):2082-2088. doi: 10.1093/bioinformatics/btx106.

Abstract

Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains.

Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects.

Availability and implementation: Pipeline and analysis code can be downloaded from http://github.com/pachterlab/metakallisto.

Contact: lpachter@math.berkeley.edu.

MeSH terms

  • Algorithms
  • Bacteria / genetics*
  • Bacteria / isolation & purification
  • Genome, Bacterial*
  • Metagenomics / methods*
  • Sequence Analysis, DNA / methods*
  • Sequence Analysis, RNA / methods
  • Software*