A machine learning approach to identifying Salmonella stress response genes in isolates from poultry processing

Food Res Int. 2024 Jan:175:113635. doi: 10.1016/j.foodres.2023.113635. Epub 2023 Nov 2.

Abstract

We explored the potential of machine learning to identify significant genes associated with Salmonella stress response during poultry processing using whole genome sequencing (WGS) data. The Salmonella isolates (n = 177) used in this study were obtained from various chicken sources (skin before chiller, chicken carcass before chiller, frozen chicken, and post-chill chicken carcass). Six machine learning algorithms (random forest, neural network, cost-sensitive learning, logit boost, and support vector machine linear and radial kernels) were trained on Salmonella WGS data, and model fit was assessed using standard evaluation metrics such as the area under the receiver operating characteristic (AUROC) curve and confusion matrix statistics. All models achieved high performances based on the AUROC metric, with logit boost showing the best performance with an AUROC score of 0.904, sensitivity of 0.889, and specificity of 0.920. The significant genes identified included ybtX, which encodes a Yersiniabactin-associated zinc transporter, and the transferase-encoding genes yccK and thiS. Additionally, genes coding for cold (cspA, cspD, and cspE) and heat shock (rpoH and rpoE) responses were identified. Other significant genes included those involved in lipopolysaccharide biosynthesis (irp1, waaD, rfc, and rfbX), DNA repair and replication (traI), biofilm formation (ccdA and fyuA), and cellular metabolism (irtA).

Keywords: Machine learning; Poultry processing; Salmonella; Stress response; Whole genome sequencing.

MeSH terms

  • Animals
  • Chickens / genetics
  • Machine Learning
  • Poultry*
  • Salmonella* / genetics
  • Whole Genome Sequencing