Pseudoalignment for metagenomic read assignment

L Schaeffer; H Pimentel; N Bray; P Melsted; L Pachter

doi:10.1093/bioinformatics/btx106

Pseudoalignment for metagenomic read assignment

Bioinformatics. 2017 Jul 15;33(14):2082-2088. doi: 10.1093/bioinformatics/btx106.

Authors

L Schaeffer¹, H Pimentel², N Bray³, P Melsted⁴, L Pachter^{1

5}

Affiliations

¹ Department of Molecular and Cell Biology, UC Berkeley, Berkeley, CA, USA.
² Department of Genetics, Stanford University, Stanford, CA, USA.
³ Department of Molecular and Cell Biology and Innovative Genomics Institute, UC Berkeley, Berkeley, CA, USA.
⁴ Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavik, Iceland.
⁵ Departments of Mathematics and Computer Science, UC Berkeley, Berkeley, CA, USA.

Abstract

Motivation: Read assignment is an important first step in many metagenomic analysis workflows, providing the basis for identification and quantification of species. However ambiguity among the sequences of many strains makes it difficult to assign reads at the lowest level of taxonomy, and reads are typically assigned to taxonomic levels where they are unambiguous. We explore connections between metagenomic read assignment and the quantification of transcripts from RNA-Seq data in order to develop novel methods for rapid and accurate quantification of metagenomic strains.

Results: We find that the recent idea of pseudoalignment introduced in the RNA-Seq context is highly applicable in the metagenomics setting. When coupled with the Expectation-Maximization (EM) algorithm, reads can be assigned far more accurately and quickly than is currently possible with state of the art software, making it possible and practical for the first time to analyze abundances of individual genomes in metagenomics projects.

Availability and implementation: Pipeline and analysis code can be downloaded from http://github.com/pachterlab/metakallisto.

Contact: lpachter@math.berkeley.edu.

MeSH terms

Algorithms
Bacteria / genetics*
Bacteria / isolation & purification
Genome, Bacterial*
Metagenomics / methods*
Sequence Analysis, DNA / methods*
Sequence Analysis, RNA / methods
Software*

Abstract

MeSH terms

Grants and funding