Computational challenges of sequence classification in microbiomic data

Paolo Ribeca; Gabriel Valiente

doi:10.1093/bib/bbr019

Computational challenges of sequence classification in microbiomic data

Brief Bioinform. 2011 Nov;12(6):614-25. doi: 10.1093/bib/bbr019. Epub 2011 Apr 18.

Authors

Paolo Ribeca¹, Gabriel Valiente

Affiliation

¹ Spanish National Center for Genomic Analysis (CNAG), Barcelona, Spain.

PMID: 21504986
DOI: 10.1093/bib/bbr019

Abstract

Next-generation sequencing technologies have opened up an unprecedented opportunity for microbiology by enabling the culture-independent genetic study of complex microbial communities, which were so far largely unknown. The analysis of metagenomic data is challenging: potentially, one is faced with a sample containing a mixture of many different bacterial species, whose genome has not necessarily been sequenced beforehand. In the simpler case of the analysis of 16S ribosomal RNA metagenomic data, for which databases of reference sequences are known, we survey the computational challenges to be solved in order to be able to characterize and quantify a sample. In particular, we examine two aspects: how the necessary adoption of new tools geared towards high-throughput analysis impacts the quality of the results, and how good is the performance of various established methods to assign sequence reads to microbial species, with and without taking taxonomic information into account.

MeSH terms

Archaea / classification
Archaea / genetics
Bacteria / classification
Bacteria / genetics
DNA, Bacterial / chemistry
Metagenome
Metagenomics / methods*
RNA, Ribosomal, 16S / chemistry

Substances

DNA, Bacterial
RNA, Ribosomal, 16S