bioOTU: An Improved Method for Simultaneous Taxonomic Assignments and Operational Taxonomic Units Clustering of 16s rRNA Gene Sequences

J Comput Biol. 2016 Apr;23(4):229-38. doi: 10.1089/cmb.2015.0214. Epub 2016 Mar 7.

Abstract

Clustering of 16s rRNA amplicon sequences into operational taxonomic units (OTUs) is the most common bioinformatics pipeline for investigating microbial community by high-throughput sequencing technologies. However, the existing algorithms of OTUs clustering still remain to be improved at reliability. Here we propose an improved method (bioOTU) that first assigns taxonomy to unique tags at genus level for separating the error-free sequences of known species in reference database from artifacts, and then cluster them into OTUs by different strategies. The remaining tags, which fail to be clustered in the previous step, are further subjected to independent OTUs clustering by the optimized algorithm of heuristic clustering. The performance tests on both mock and real communities revealed that bioOTU is powerful for recovering the underlying profiles at both microbial composition and abundance, and it also produces comparable or less number of OTUs in comparison with the prevailing tools of Mothur and UPARSE. The bioOTU is implemented in C and Python languages with source codes freely available on the GitHub repository.

Keywords: 16s rRNA; next-generation sequencing; operational taxonomic units..

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / classification*
  • Bacteria / genetics
  • Phylogeny*
  • RNA, Ribosomal, 16S / genetics*
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Software*

Substances

  • RNA, Ribosomal, 16S