Serovar-level Identification of Bacterial Foodborne Pathogens From Full-length 16S rRNA Gene Sequencing

bioRxiv [Preprint]. 2023 Jun 28:2023.06.28.546915. doi: 10.1101/2023.06.28.546915.

Abstract

The resolution of variation within species is critical for interpreting and acting on many microbial measurements. In the key foodborne pathogens Escherichia coli and Salmonella, the primary sub-species classification scheme used is serotyping: differentiating variants within these species by surface antigen profiles. Serotype prediction from whole-genome sequencing (WGS) of isolates is now seen as comparable or preferable to traditional laboratory methods where WGS is available. However, laboratory and WGS methods depend on an isolation step that is time-consuming and incompletely represents the sample when multiple strains are present. Community sequencing approaches that skip the isolation step are therefore of interest for pathogen surveillance. Here we evaluated the viability of amplicon sequencing of the full-length 16S rRNA gene for serotyping S. enterica and E. coli. We developed a novel algorithm for serotype prediction, implemented as an R package (Seroplacer), which takes as input full-length 16S rRNA gene sequences and outputs serovar predictions after phylogenetic placement into a reference phylogeny. We achieved over 89% accuracy in predicting Salmonella serotypes on in silico test data, and identified key pathogenic serovars of Salmonella and E. coli in isolate and environmental test samples. Although serotype prediction from 16S sequences is not as accurate as serotype prediction from WGS of isolates, the potential to identify dangerous serovars directly from amplicon sequencing of environmental samples is intriguing for pathogen surveillance. The capabilities developed here are also broadly relevant to other applications where intra-species variation and direct sequencing from environmental samples could be valuable.

Publication types

  • Preprint