Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study

Zheng-Gang Fang; Shu-Qin Yang; Cai-Xia Lv; Shu-Yi An; Wei Wu

doi:10.1136/bmjopen-2021-056685

Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study

BMJ Open. 2022 Jul 1;12(7):e056685. doi: 10.1136/bmjopen-2021-056685.

Authors

Zheng-Gang Fang¹, Shu-Qin Yang¹, Cai-Xia Lv¹, Shu-Yi An², Wei Wu³

Affiliations

¹ Department of Epidemiology, China Medical University, Shenyang, China.
² Department of Social Medicine and Health, Liaoning Provincial Center for Disease Control and Prevention, Shenyang, China.
³ Department of Epidemiology, China Medical University, Shenyang, China wuwei@cmu.edu.cn.

Abstract

Objective: The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA.

Design: Time-series study.

Setting: The USA was the setting for this study.

Main outcome measures: Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models.

Results: In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model.

Conclusions: The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.

Keywords: COVID-19; epidemiology.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

COVID-19* / epidemiology
China / epidemiology
Forecasting
Humans
Incidence
Models, Statistical*
United States / epidemiology