Epidemiological Data Mining for Assisting with Foodborne Outbreak Investigation

Foods. 2023 Oct 19;12(20):3825. doi: 10.3390/foods12203825.

Abstract

Diseases caused by the consumption of food are a significant but avoidable public health issue, and identifying the source of contamination is a key step in an outbreak investigation to prevent foodborne illnesses. Historical foodborne outbreaks provide rich data on critical attributes such as outbreak factors, food vehicles, and etiologies, and an improved understanding of the relationships between these attributes could provide insights for developing effective food safety interventions. The purpose of this study was to identify hidden patterns underlying the relations between the critical attributes involved in historical foodborne outbreaks through data mining approaches. A statistical analysis was used to identify the associations between outbreak factors and food sources, and the factors that were strongly significant were selected as predictive factors for food vehicles. A multinomial prediction model was built based on factors selected for predicting "simple" foods (beef, dairy, and vegetables) as sources of outbreaks. In addition, the relations between the food vehicles and common etiologies were investigated through text mining approaches (support vector machines, logistic regression, random forest, and naïve Bayes). A support vector machine model was identified as the optimal model to predict etiologies from the occurrence of food vehicles. Association rules also indicated the specific food vehicles that have strong relations to the etiologies. Meanwhile, a food ingredient network describing the relationships between foods and ingredients was constructed and used with Monte Carlo simulation to predict possible ingredients from foods that cause an outbreak. The simulated results were confirmed with foods and ingredients that are already known to cause historical foodborne outbreaks. The method could provide insights into the prediction of the possible ingredient sources of contamination when given the name of a food. The results could provide insights into the early identification of food sources of contamination and assist in future outbreak investigations. The data-driven approach will provide a new perspective and strategies for discovering hidden knowledge from massive data.

Keywords: cross contamination; data mining; epidemiological data; foodborne diseases; network analysis; outbreak investigation.