Computational challenges of sequence classification in microbiomic data

Brief Bioinform. 2011 Nov;12(6):614-25. doi: 10.1093/bib/bbr019. Epub 2011 Apr 18.

Abstract

Next-generation sequencing technologies have opened up an unprecedented opportunity for microbiology by enabling the culture-independent genetic study of complex microbial communities, which were so far largely unknown. The analysis of metagenomic data is challenging: potentially, one is faced with a sample containing a mixture of many different bacterial species, whose genome has not necessarily been sequenced beforehand. In the simpler case of the analysis of 16S ribosomal RNA metagenomic data, for which databases of reference sequences are known, we survey the computational challenges to be solved in order to be able to characterize and quantify a sample. In particular, we examine two aspects: how the necessary adoption of new tools geared towards high-throughput analysis impacts the quality of the results, and how good is the performance of various established methods to assign sequence reads to microbial species, with and without taking taxonomic information into account.

MeSH terms

  • Archaea / classification
  • Archaea / genetics
  • Bacteria / classification
  • Bacteria / genetics
  • DNA, Bacterial / chemistry
  • Metagenome
  • Metagenomics / methods*
  • RNA, Ribosomal, 16S / chemistry

Substances

  • DNA, Bacterial
  • RNA, Ribosomal, 16S