Managing Contamination and Diverse Bacterial Loads in 16S rRNA Deep Sequencing of Clinical Samples: Implications of the Law of Small Numbers

mBio. 2021 Jun 29;12(3):e0059821. doi: 10.1128/mBio.00598-21. Epub 2021 Jun 8.

Abstract

In this article, we investigate patterns of microbial DNA contamination in targeted 16S rRNA amplicon sequencing (16S deep sequencing) and demonstrate how this can be used to filter background bacterial DNA in diagnostic microbiology. We also investigate the importance of sequencing depth. We first determined the patterns of contamination by performing repeat 16S deep sequencing of negative and positive extraction controls. This process identified a few bacterial species dominating across all replicates but also a high intersample variability among low abundance contaminant species in replicates split before PCR amplification. Replicates split after PCR amplification yielded almost identical sequencing results. On the basis of these observations, we suggest using the abundance of the most dominant contaminant species to define a threshold in each clinical sample from where identifications with lower abundances possibly represent contamination. We evaluated this approach by sequencing of a diluted, staggered mock community and of bile samples from 41 patients with acute cholangitis and noninfectious bile duct stenosis. All clinical samples were sequenced twice using different sequencing depths. We were able to demonstrate the following: (i) The high intersample variability between sequencing replicates is caused by events occurring before or during the PCR amplification step. (ii) Knowledge about the most dominant contaminant species can be used to establish sample-specific cutoffs for reliable identifications. (iii) Below the level of the most abundant contaminant, it rapidly becomes very demanding to reliably discriminate between background and true findings. (iv) Adequate sequencing depth can be claimed only when the analysis also picks up background contamination. IMPORTANCE There has been a gradual increase in 16S deep sequencing studies on infectious disease materials. Management of bacterial DNA contamination is a major challenge in such diagnostics, particularly in low biomass samples. Reporting a contaminant species as a relevant pathogen may cause unnecessary antibiotic treatment or even falsely classify a noninfectious condition as a bacterial infection. Yet, there are few studies on how to filter contamination in clinical microbiology. Here, we demonstrate that sequencing of extraction controls will not reveal the full spectrum of contaminants that could occur in the associated clinical samples. Only the most abundant contaminant species were consistently detected, and we present how this can be used to set sample specific thresholds for reliable identifications. We believe this work can facilitate the implementation of 16S deep sequencing in diagnostic laboratories. The new data we provide on the patterns of microbial DNA contamination is also important for microbiome research.

Keywords: 16S rRNA; NGS; acute cholangitis; contamination; rpoB; targeted amplicon sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Aged
  • Aged, 80 and over
  • Bacteria / genetics*
  • Bacterial Load / methods*
  • Bile / microbiology
  • Cholangitis / microbiology
  • Clinical Laboratory Techniques / methods*
  • Clinical Laboratory Techniques / standards
  • DNA Contamination
  • DNA, Bacterial / genetics*
  • Female
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Male
  • Microbiota / genetics
  • Middle Aged
  • RNA, Ribosomal, 16S / genetics*
  • Sequence Analysis, DNA
  • Young Adult

Substances

  • DNA, Bacterial
  • RNA, Ribosomal, 16S