PneumoKITy: A fast, flexible, specific, and sensitive tool for Streptococcus pneumoniae serotype screening and mixed serotype detection from genome sequence data

Microb Genom. 2022 Dec;8(12):mgen000904. doi: 10.1099/mgen.0.000904.

Abstract

Determination of serotypes of Streptococcus pneumoniae is essential for monitoring current vaccine programmes. Since October 2017, pneumococcal serotypes in England have been derived from whole genome sequencing (WGS) data using our bioinformatic tool PneumoCaT. That tool was designed for serotype determination from pure cultures in a reference laboratory. To help determine multiple serotypes in pneumococcal carriage samples, we developed a new software tool named PneumoKITy (Pneumococcal K-mer Integrated Typing) that uses the powerful Mash k-mer screening method for pneumococcal serotyping. Mash k-mer screening is more sequence specific and much faster than the mapping method used in PneumoCaT and can determine 54 (58.1 %) of the 93 serotypes in the SSI Diagnostica phenotypical serotyping scheme to type level with the remainder called to serogroup or subgroup level (e.g., 11A/D). PneumoKITy can be run on both FastQ and assembly input, requiring up to 11× less memory and running up to 29× faster than the current version of PneumoCaT (1.2.1) on FastQ files. PneumoKITy can be used as a rapid, flexible serotype screening method which adds sensitive detection of mixed serotypes, e.g., for nasopharyngeal carriage studies where the presence of multiple serotypes is common. PneumoKITy's ability to function from assembly file, for pure culture serotype detection, increases its speed. This speed potentially enables the software to be run using low infrastructure overhead via web-based platforms. PneumoKITy could be used as a fast initial screening method with other tools used for those serotypes that could not be fully determined to type level if necessary. PneumoKITy was found to be highly accurate and sensitive when run on a panel of FastQ files derived from mixed cultures with all serotypes in 47/51 (92.2 %) of samples being accurately detected. PneumoKITy was also able to accurately estimate the relative abundance of serotypes in the same sample. Estimates being within a mean relative abundance of 1.5 % of the expected abundance in mixtures with known concentrations. PneumoKITy was able to detect minor serotypes with expected abundance of 1 % in the known mixture serotypes. PneumoKITy is a rapid, flexible tool with wide-ranging applications outside of the pure-culture, reference laboratory serotyping remit of PneumoCaT.

Keywords: Pneumococcus; bioinformatics; carriage; colonisation; detection; epidemiology; serotyping; software.

MeSH terms

  • Pneumococcal Vaccines*
  • Serogroup
  • Serotyping / methods
  • Streptococcus pneumoniae*
  • Whole Genome Sequencing

Substances

  • Pneumococcal Vaccines

Associated data

  • figshare/10.6084/m9.figshare.21067339.v1