Predicting nationwide obesity from food sales using machine learning

Jocelyn Dunstan; Marcela Aguirre; Magdalena Bastías; Claudia Nau; Thomas A Glass; Felipe Tobar

doi:10.1177/1460458219845959

Predicting nationwide obesity from food sales using machine learning

Health Informatics J. 2020 Mar;26(1):652-663. doi: 10.1177/1460458219845959. Epub 2019 May 19.

Authors

Jocelyn Dunstan¹, Marcela Aguirre², Magdalena Bastías², Claudia Nau, Thomas A Glass³, Felipe Tobar²

Affiliations

¹ Johns Hopkins University, USA; University of Chile, Chile.
² University of Chile, Chile.
³ Johns Hopkins University, USA.

PMID: 31106648
DOI: 10.1177/1460458219845959

Abstract

The obesity epidemic progresses everywhere across the globe, and implementing frequent nationwide surveys to measure the percentage of obese population is costly. Conversely, country-level food sales information can be accessed inexpensively through different suppliers on a regular basis. This study applies a methodology to predict obesity prevalence at the country-level based on national sales of a small subset of food and beverage categories. Three machine learning algorithms for nonlinear regression were implemented using purchase and obesity prevalence data from 79 countries: support vector machines, random forests and extreme gradient boosting. The proposed method was validated in terms of both the absolute prediction error and the proportion of countries for which the obesity prevalence was predicted satisfactorily. We found that the most-relevant food category to predict obesity is baked goods and flours, followed by cheese and carbonated drinks.

Keywords: databases and data mining; food sales; machine learning; obesity; supervised learning.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Commerce
Food*
Humans
Machine Learning*
Obesity / epidemiology
Support Vector Machine

Grants and funding

U54 HD070725/HD/NICHD NIH HHS/United States