Analysis and prediction of hand, foot and mouth disease incidence in China using Random Forest and XGBoost

PLoS One. 2021 Dec 22;16(12):e0261629. doi: 10.1371/journal.pone.0261629. eCollection 2021.

Abstract

Hand, foot and mouth disease (HFMD) is an increasingly serious public health problem, and it has caused an outbreak in China every year since 2008. Predicting the incidence of HFMD and analyzing its influential factors are of great significance to its prevention. Now, machine learning has shown advantages in infectious disease models, but there are few studies on HFMD incidence based on machine learning that cover all the provinces in mainland China. In this study, we proposed two different machine learning algorithms, Random Forest and eXtreme Gradient Boosting (XGBoost), to perform our analysis and prediction. We first used Random Forest to examine the association between HFMD incidence and potential influential factors for 31 provinces in mainland China. Next, we established Random Forest and XGBoost prediction models using meteorological and social factors as the predictors. Finally, we applied our prediction models in four different regions of mainland China and evaluated the performance of them. Our results show that: 1) Meteorological factors and social factors jointly affect the incidence of HFMD in mainland China. Average temperature and population density are the two most significant influential factors; 2) Population flux has different delayed effect in affecting HFMD incidence in different regions. From a national perspective, the model using population flux data delayed for one month has better prediction performance; 3) The prediction capability of XGBoost model was better than that of Random Forest model from the overall perspective. XGBoost model is more suitable for predicting the incidence of HFMD in mainland China.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China / epidemiology
  • Disease Outbreaks
  • Hand, Foot and Mouth Disease / epidemiology*
  • Humans
  • Incidence
  • Machine Learning
  • Meteorological Concepts
  • Population Density
  • Public Health
  • Risk Factors
  • Temperature

Grants and funding

This study was supported by Natural Science Foundation of Shandong under Grant ZR2018MH037. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.