Accuracy comparison of ARIMA and XGBoost forecasting models in predicting the incidence of COVID-19 in Bangladesh

Md Siddikur Rahman; Arman Hossain Chowdhury; Miftahuzzannat Amrin

doi:10.1371/journal.pgph.0000495

Accuracy comparison of ARIMA and XGBoost forecasting models in predicting the incidence of COVID-19 in Bangladesh

PLOS Glob Public Health. 2022 May 18;2(5):e0000495. doi: 10.1371/journal.pgph.0000495. eCollection 2022.

Authors

Md Siddikur Rahman¹, Arman Hossain Chowdhury¹, Miftahuzzannat Amrin¹

Affiliation

¹ Department of Statistics, Begum Rokeya University, Rangpur, Bangladesh.

Abstract

Accurate predictive time series modelling is important in public health planning and response during the emergence of a novel pandemic. Therefore, the aims of the study are three-fold: (a) to model the overall trend of COVID-19 confirmed cases and deaths in Bangladesh; (b) to generate a short-term forecast of 8 weeks of COVID-19 cases and deaths; (c) to compare the predictive accuracy of the Autoregressive Integrated Moving Average (ARIMA) and eXtreme Gradient Boosting (XGBoost) for precise modelling of non-linear features and seasonal trends of the time series. The data were collected from the onset of the epidemic in Bangladesh from the Directorate General of Health Service (DGHS) and Institute of Epidemiology, Disease Control and Research (IEDCR). The daily confirmed cases and deaths of COVID-19 of 633 days in Bangladesh were divided into several training and test sets. The ARIMA and XGBoost models were established using those training data, and the test sets were used to evaluate each model's ability to forecast and finally averaged all the predictive performances to choose the best model. The predictive accuracy of the models was assessed using the mean absolute error (MAE), mean percentage error (MPE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The findings reveal the existence of a nonlinear trend and weekly seasonality in the dataset. The average error measures of the ARIMA model for both COVID-19 confirmed cases and deaths were lower than XGBoost model. Hence, in our study, the ARIMA model performed better than the XGBoost model in predicting COVID-19 confirmed cases and deaths in Bangladesh. The suggested prediction model might play a critical role in estimating the spread of a novel pandemic in Bangladesh and similar countries.

Copyright: © 2022 Rahman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Grants and funding

The authors received no specific funding for this work.