Improving the precision of modeling the incidence of hemorrhagic fever with renal syndrome in mainland China with an ensemble machine learning approach

PLoS One. 2021 Mar 16;16(3):e0248597. doi: 10.1371/journal.pone.0248597. eCollection 2021.

Abstract

Objective: Hemorrhagic fever with renal syndrome (HFRS), one of the main public health concerns in mainland China, is a group of clinically similar diseases caused by hantaviruses. Statistical approaches have always been leveraged to forecast the future incidence rates of certain infectious diseases to effectively control their prevalence and outbreak potential. Compared to the use of one base model, model stacking can often produce better forecasting results. In this study, we fitted the monthly reported cases of HFRS in mainland China with a model stacking approach and compared its forecasting performance with those of five base models.

Method: We fitted the monthly reported cases of HFRS ranging from January 2004 to June 2019 in mainland China with an autoregressive integrated moving average (ARIMA) model; the Holt-Winter (HW) method, seasonal decomposition of the time series by LOESS (STL); a neural network autoregressive (NNAR) model; and an exponential smoothing state space model with a Box-Cox transformation; ARMA errors; and trend and seasonal components (TBATS), and we combined the forecasting results with the inverse rank approach. The forecasting performance was estimated based on several accuracy criteria for model prediction, including the mean absolute percentage error (MAPE), root-mean-squared error (RMSE) and mean absolute error (MAE).

Result: There was a slight downward trend and obvious seasonal periodicity inherent in the time series data for HFRS in mainland China. The model stacking method was selected as the best approach with the best performance in terms of both fitting (RMSE 128.19, MAE 85.63, MAPE 8.18) and prediction (RMSE 151.86, MAE 118.28, MAPE 13.16).

Conclusion: The results showed that model stacking by using the optimal mean forecasting weight of the five abovementioned models achieved the best performance in terms of predicting HFRS one year into the future. This study has corroborated the conclusion that model stacking is an easy way to enhance prediction accuracy when modeling HFRS.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China / epidemiology
  • Datasets as Topic
  • Disease Outbreaks / statistics & numerical data*
  • Epidemiological Monitoring*
  • Forecasting / methods
  • Hemorrhagic Fever with Renal Syndrome / epidemiology*
  • Hemorrhagic Fever with Renal Syndrome / virology
  • Humans
  • Incidence
  • Machine Learning*
  • Models, Statistical
  • Neural Networks, Computer*
  • Orthohantavirus / pathogenicity
  • Seasons

Grants and funding

This study was supported by the National Natural Science Foundation of China (Grant No. 81202254 and 71974199) and the Health and Medical Big Data Research Project of China Medical University (Grant No. HMB201903105).