Advanced computational algorithms for microbial community analysis using massive 16S rRNA sequence data

Nucleic Acids Res. 2010 Dec;38(22):e205. doi: 10.1093/nar/gkq872. Epub 2010 Oct 6.

Abstract

With the aid of next-generation sequencing technology, researchers can now obtain millions of microbial signature sequences for diverse applications ranging from human epidemiological studies to global ocean surveys. The development of advanced computational strategies to maximally extract pertinent information from massive nucleotide data has become a major focus of the bioinformatics community. Here, we describe a novel analytical strategy including discriminant and topology analyses that enables researchers to deeply investigate the hidden world of microbial communities, far beyond basic microbial diversity estimation. We demonstrate the utility of our approach through a computational study performed on a previously published massive human gut 16S rRNA data set. The application of discriminant and topology analyses enabled us to derive quantitative disease-associated microbial signatures and describe microbial community structure in far more detail than previously achievable. Our approach provides rigorous statistical tools for sequence-based studies aimed at elucidating associations between known or unknown organisms and a variety of physiological or environmental conditions.

MeSH terms

  • Algorithms*
  • Bacteria / classification
  • Bacteria / genetics
  • Bacteria / isolation & purification
  • Computational Biology / methods
  • Discriminant Analysis
  • Gastrointestinal Tract / microbiology
  • Humans
  • Metagenome*
  • Obesity / microbiology
  • Phylogeny
  • RNA, Ribosomal, 16S / genetics*
  • Sequence Analysis, DNA / methods*

Substances

  • RNA, Ribosomal, 16S