Classification of 16S rRNA reads is improved using a niche-specific database constructed by near-full length sequencing

PLoS One. 2020 Jul 13;15(7):e0235498. doi: 10.1371/journal.pone.0235498. eCollection 2020.

Abstract

Surveys of microbial populations in environmental niches of interest often utilize sequence variation in the gene encoding the ribosomal small subunit (the 16S rRNA gene). Generally, these surveys target the 16S genes using semi-degenerate primers to amplify portions of a subset of bacterial species, sequence the amplicons in bulk, and assign to putative taxonomic categories by comparison to databases purporting to connect specific sequences in the main variable regions of the gene to specific organisms. Due to sequence length constraints of the most popular bulk sequencing platforms, the primers selected amplify one to three of the nine variable regions, and taxonomic assignment is based on relatively short stretches of sequence (150-500 bases). We demonstrate that taxonomic assignment is improved through reduced unassigned reads by including a survey of near-full-length sequences specific to the target environment, using a niche of interest represented by the upper respiratory tract (URT) of cattle. We created a custom Bovine URT database from these longer sequences for assignment of shorter, less expensive reads in comparisons of the upper respiratory tract among individual animals. This process improves the ability to detect changes in the microbial populations of a given environment, and the accuracy of defining the content of that environment at increasingly higher taxonomic resolution.

MeSH terms

  • Animals
  • Cattle
  • Databases, Genetic*
  • RNA, Ribosomal, 16S / genetics*
  • Reference Standards
  • Sequence Analysis, RNA / methods*
  • Sequence Analysis, RNA / standards

Substances

  • RNA, Ribosomal, 16S

Grants and funding

The authors received no funding for this work.