Crowdsourcing and machine learning approaches for extracting entities indicating potential foodborne outbreaks from social media

Dandan Tao; Dongyu Zhang; Ruofan Hu; Elke Rundensteiner; Hao Feng

doi:10.1038/s41598-021-00766-w

Crowdsourcing and machine learning approaches for extracting entities indicating potential foodborne outbreaks from social media

Sci Rep. 2021 Nov 4;11(1):21678. doi: 10.1038/s41598-021-00766-w.

Authors

Dandan Tao¹, Dongyu Zhang², Ruofan Hu², Elke Rundensteiner^{3

4}, Hao Feng⁵

Affiliations

¹ Department of Food Science and Human Nutrition, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, 382F Agricultural Engineering Sciences Building, 1304 W. Pennsylvania Ave., Urbana, IL, 61801, USA.
² Data Science Program, Worcester Polytechnic Institute, Fuller Labs 135, 100 Institute Road, Worcester, MA, 01609, USA.
³ Data Science Program, Worcester Polytechnic Institute, Fuller Labs 135, 100 Institute Road, Worcester, MA, 01609, USA. rundenst@wpi.edu.
⁴ Department of Computer Science, Worcester Polytechnic Institute, Worcester, USA. rundenst@wpi.edu.
⁵ Department of Food Science and Human Nutrition, College of Agricultural, Consumer and Environmental Sciences, University of Illinois at Urbana-Champaign, 382F Agricultural Engineering Sciences Building, 1304 W. Pennsylvania Ave., Urbana, IL, 61801, USA. haofeng@illinois.edu.

Abstract

Foodborne outbreaks are a serious but preventable threat to public health that often lead to illness, loss of life, significant economic loss, and the erosion of consumer confidence. Understanding how consumers respond when interacting with foods, as well as extracting information from posts on social media may provide new means of reducing the risks and curtailing the outbreaks. In recent years, Twitter has been employed as a new tool for identifying unreported foodborne illnesses. However, there is a huge gap between the identification of sporadic illnesses and the early detection of a potential outbreak. In this work, the dual-task BERTweet model was developed to identify unreported foodborne illnesses and extract foodborne-illness-related entities from Twitter. Unlike previous methods, our model leveraged the mutually beneficial relationships between the two tasks. The results showed that the F1-score of relevance prediction was 0.87, and the F1-score of entity extraction was 0.61. Key elements such as time, location, and food detected from sentences indicating foodborne illnesses were used to analyze potential foodborne outbreaks in massive historical tweets. A case study on tweets indicating foodborne illnesses showed that the discovered trend is consistent with the true outbreaks that occurred during the same period.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Contact Tracing / methods*
Crowdsourcing / methods
Disease Outbreaks / prevention & control*
Foodborne Diseases / epidemiology*
Foodborne Diseases / etiology
Humans
Machine Learning
Models, Theoretical
Population Surveillance / methods
Public Health / methods
Public Health / trends
Social Media / trends

Abstract

Publication types

MeSH terms

Grants and funding