Machine learning to predict foodborne salmonellosis outbreaks based on genome characteristics and meteorological trends

Curr Res Food Sci. 2023 May 28:6:100525. doi: 10.1016/j.crfs.2023.100525. eCollection 2023.

Abstract

Several studies have shown a correlation between outbreaks of Salmonella enterica and meteorological trends, especially related to temperature and precipitation. Additionally, current studies based on outbreaks are performed on data for the species Salmonella enterica, without considering its intra-species and genetic heterogeneity. In this study, we analyzed the effect of differential gene expression and a suite of meteorological factors on salmonellosis outbreak scale (typified by case numbers) using a combination of machine learning and count-based modeling methods. Elastic Net regularization model was used to identify significant genes from a Salmonella pan-genome, and a multi-variable Poisson regression developed to fit the individual and mixed effects data. The best-fit Elastic Net model (α = 0.50; λ = 2.18) identified 53 significant gene features. The final multi-variable Poisson regression model (χ2 = 5748.22; pseudo R2 = 0.669; probability > χ2 = 0) identified 127 significant predictor terms (p < 0.10), comprising 45 gene-only predictors, average temperature, average precipitation, and average snowfall, and 79 gene-meteorological interaction terms. The significant genes ranged in functionality from cellular signaling and transport, virulence, metabolism, and stress response, and included gene variables not considered as significant by the baseline model. This study presents a holistic approach towards evaluating multiple data sources (such as genomic and environmental data) to predict outbreak scale, which could help in revising the estimates for human health risk.

Keywords: Elastic net; Meteorological variables; Outbreaks; Salmonellosis; Whole genome sequencing.