Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study

BMJ Open. 2022 Jul 1;12(7):e056685. doi: 10.1136/bmjopen-2021-056685.

Abstract

Objective: The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA.

Design: Time-series study.

Setting: The USA was the setting for this study.

Main outcome measures: Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models.

Results: In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model.

Conclusions: The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.

Keywords: COVID-19; epidemiology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19* / epidemiology
  • China / epidemiology
  • Forecasting
  • Humans
  • Incidence
  • Models, Statistical*
  • United States / epidemiology