RESCUE: a validated Nanopore pipeline to classify bacteria through long-read, 16S-ITS-23S rRNA sequencing

Front Microbiol. 2023 Jul 20:14:1201064. doi: 10.3389/fmicb.2023.1201064. eCollection 2023.

Abstract

Despite the advent of third-generation sequencing technologies, modern bacterial ecology studies still use Illumina to sequence small (~400 bp) hypervariable regions of the 16S rRNA SSU for phylogenetic classification. By sequencing a larger region of the rRNA gene operons, the limitations and biases of sequencing small portions can be removed, allowing for more accurate classification with deeper taxonomic resolution. With Nanopore sequencing now providing raw simplex reads with quality scores above Q20 using the kit 12 chemistry, the ease, cost, and portability of Nanopore play a leading role in performing differential bacterial abundance analysis. Sequencing the near-entire rrn operon of bacteria and archaea enables the use of the universally conserved operon holding evolutionary polymorphisms for taxonomic resolution. Here, a reproducible and validated pipeline was developed, RRN-operon Enabled Species-level Classification Using EMU (RESCUE), to facilitate the sequencing of bacterial rrn operons and to support import into phyloseq. Benchmarking RESCUE showed that fully processed reads are now parallel or exceed the quality of Sanger, with median quality scores of approximately Q20+, using the R10.4 and Guppy SUP basecalling. The pipeline was validated through two complex mock samples, the use of multiple sample types, with actual Illumina data, and across four databases. RESCUE sequencing is shown to drastically improve classification to the species level for most taxa and resolves erroneous taxa caused by using short reads such as Illumina.

Keywords: 16S rRNA; 16S-ITS-23S; Illumina; Nanopore; bacterial classification; microbiome; rrn; sequencing.

Grants and funding

This study was supported by the University of Florida's Microbiology and Cell Science Department and by NSF HSI Grant No. 1832436. This work was partially supported by USDA BRAG, grant number 2022-33522-38271 awarded to ET.