Prediction system for rapid identification of Salmonella serotypes based on pulsed-field gel electrophoresis fingerprints

J Clin Microbiol. 2012 May;50(5):1524-32. doi: 10.1128/JCM.00111-12. Epub 2012 Feb 29.

Abstract

A classification model is presented for rapid identification of Salmonella serotypes based on pulsed-field gel electrophoresis (PFGE) fingerprints. The classification model was developed using random forest and support vector machine algorithms and was then applied to a database of 45,923 PFGE patterns, randomly selected from all submissions to CDC PulseNet from 2005 to 2010. The patterns selected included the top 20 most frequent serotypes and 12 less frequent serotypes from various sources. The prediction accuracies for the 32 serotypes ranged from 68.8% to 99.9%, with an overall accuracy of 96.0% for the random forest classification, and ranged from 67.8% to 100.0%, with an overall accuracy of 96.1% for the support vector machine classification. The prediction system improves reliability and accuracy and provides a new tool for early and fast screening and source tracking of outbreak isolates. It is especially useful to get serotype information before the conventional methods are done. Additionally, this system also works well for isolates that are serotyped as "unknown" by conventional methods, and it is useful for a laboratory where standard serotyping is not available.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Cluster Analysis
  • Computational Biology / methods
  • DNA Fingerprinting / methods*
  • Electrophoresis, Gel, Pulsed-Field / methods*
  • Genotype
  • Humans
  • Molecular Typing / methods*
  • Salmonella / classification*
  • Salmonella / genetics*
  • Salmonella / isolation & purification
  • Serotyping