Improving the precision of modeling the incidence of hemorrhagic fever with renal syndrome in mainland China with an ensemble machine learning approach

Guo-Hua Ye; Mirxat Alim; Peng Guan; De-Sheng Huang; Bao-Sen Zhou; Wei Wu

doi:10.1371/journal.pone.0248597

Improving the precision of modeling the incidence of hemorrhagic fever with renal syndrome in mainland China with an ensemble machine learning approach

PLoS One. 2021 Mar 16;16(3):e0248597. doi: 10.1371/journal.pone.0248597. eCollection 2021.

Authors

Guo-Hua Ye¹, Mirxat Alim¹, Peng Guan¹, De-Sheng Huang², Bao-Sen Zhou¹, Wei Wu¹

Affiliations

¹ Department of Epidemiology, School of Public Health, China Medical University, Shenyang, Liaoning, China.
² Department of Mathematics, School of Fundamental Sciences, China Medical University, Shenyang, Liaoning, China.

Abstract

Objective: Hemorrhagic fever with renal syndrome (HFRS), one of the main public health concerns in mainland China, is a group of clinically similar diseases caused by hantaviruses. Statistical approaches have always been leveraged to forecast the future incidence rates of certain infectious diseases to effectively control their prevalence and outbreak potential. Compared to the use of one base model, model stacking can often produce better forecasting results. In this study, we fitted the monthly reported cases of HFRS in mainland China with a model stacking approach and compared its forecasting performance with those of five base models.

Method: We fitted the monthly reported cases of HFRS ranging from January 2004 to June 2019 in mainland China with an autoregressive integrated moving average (ARIMA) model; the Holt-Winter (HW) method, seasonal decomposition of the time series by LOESS (STL); a neural network autoregressive (NNAR) model; and an exponential smoothing state space model with a Box-Cox transformation; ARMA errors; and trend and seasonal components (TBATS), and we combined the forecasting results with the inverse rank approach. The forecasting performance was estimated based on several accuracy criteria for model prediction, including the mean absolute percentage error (MAPE), root-mean-squared error (RMSE) and mean absolute error (MAE).

Result: There was a slight downward trend and obvious seasonal periodicity inherent in the time series data for HFRS in mainland China. The model stacking method was selected as the best approach with the best performance in terms of both fitting (RMSE 128.19, MAE 85.63, MAPE 8.18) and prediction (RMSE 151.86, MAE 118.28, MAPE 13.16).

Conclusion: The results showed that model stacking by using the optimal mean forecasting weight of the five abovementioned models achieved the best performance in terms of predicting HFRS one year into the future. This study has corroborated the conclusion that model stacking is an easy way to enhance prediction accuracy when modeling HFRS.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

China / epidemiology
Datasets as Topic
Disease Outbreaks / statistics & numerical data*
Epidemiological Monitoring*
Forecasting / methods
Hemorrhagic Fever with Renal Syndrome / epidemiology*
Hemorrhagic Fever with Renal Syndrome / virology
Humans
Incidence
Machine Learning*
Models, Statistical
Neural Networks, Computer*
Orthohantavirus / pathogenicity
Seasons

Grants and funding

This study was supported by the National Natural Science Foundation of China (Grant No. 81202254 and 71974199) and the Health and Medical Big Data Research Project of China Medical University (Grant No. HMB201903105).