Synthetic spike-in standards for high-throughput 16S rRNA gene amplicon sequencing

Nucleic Acids Res. 2017 Feb 28;45(4):e23. doi: 10.1093/nar/gkw984.

Abstract

High-throughput sequencing of 16S rRNA gene amplicons (16S-seq) has become a widely deployed method for profiling complex microbial communities but technical pitfalls related to data reliability and quantification remain to be fully addressed. In this work, we have developed and implemented a set of synthetic 16S rRNA genes to serve as universal spike-in standards for 16S-seq experiments. The spike-ins represent full-length 16S rRNA genes containing artificial variable regions with negligible identity to known nucleotide sequences, permitting unambiguous identification of spike-in sequences in 16S-seq read data from any microbiome sample. Using defined mock communities and environmental microbiota, we characterized the performance of the spike-in standards and demonstrated their utility for evaluating data quality on a per-sample basis. Further, we showed that staggered spike-in mixtures added at the point of DNA extraction enable concurrent estimation of absolute microbial abundances suitable for comparative analysis. Results also underscored that template-specific Illumina sequencing artifacts may lead to biases in the perceived abundance of certain taxa. Taken together, the spike-in standards represent a novel bioanalytical tool that can substantially improve 16S-seq-based microbiome studies by enabling comprehensive quality control along with absolute quantification.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bacteria / classification
  • Bacteria / genetics
  • Computational Biology
  • Environmental Microbiology
  • High-Throughput Nucleotide Sequencing*
  • Metagenome*
  • Metagenomics* / methods
  • Metagenomics* / standards
  • Microbiota
  • RNA, Ribosomal, 16S / genetics*
  • Reference Standards

Substances

  • RNA, Ribosomal, 16S