Forecasting daily COVID-19 cases with gradient boosted regression trees and other methods: evidence from U.S. cities

Front Public Health. 2023 Dec 11:11:1259410. doi: 10.3389/fpubh.2023.1259410. eCollection 2023.

Abstract

Introduction: There is a vast literature on the performance of different short-term forecasting models for country specific COVID-19 cases, but much less research with respect to city level cases. This paper employs daily case counts for 25 Metropolitan Statistical Areas (MSAs) in the U.S. to evaluate the efficacy of a variety of statistical forecasting models with respect to 7 and 28-day ahead predictions.

Methods: This study employed Gradient Boosted Regression Trees (GBRT), Linear Mixed Effects (LME), Susceptible, Infectious, or Recovered (SIR), and Seasonal Autoregressive Integrated Moving Average (SARIMA) models to generate daily forecasts of COVID-19 cases from November 2020 to March 2021.

Results: Consistent with other research that have employed Machine Learning (ML) based methods, we find that Median Absolute Percentage Error (MAPE) values for both 7-day ahead and 28-day ahead predictions from GBRTs are lower than corresponding values from SIR, Linear Mixed Effects (LME), and Seasonal Autoregressive Integrated Moving Average (SARIMA) specifications for the majority of MSAs during November-December 2020 and January 2021. GBRT and SARIMA models do not offer high-quality predictions for February 2021. However, SARIMA generated MAPE values for 28-day ahead predictions are slightly lower than corresponding GBRT estimates for March 2021.

Discussion: The results of this research demonstrate that basic ML models can lead to relatively accurate forecasts at the local level, which is important for resource allocation decisions and epidemiological surveillance by policymakers.

Keywords: Gradient Boosted Regression Trees; Infectious; Linear Mixed Effects; Metropolitan Statistical Areas; Seasonal Autoregressive Integrated Moving Average (SARIMA); Susceptible; daily COVID-19 cases; epidemiological surveillance; or Recovered (SIR).

MeSH terms

  • COVID-19* / epidemiology
  • Cities / epidemiology
  • Humans
  • Incidence
  • Models, Statistical
  • Seasons

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. Funding for this study was obtained to: NT gratefully acknowledges financial assistance from NSERC Discovery Grant RGPIN-2019-04212. JD acknowledges financial support from NSERC Discovery Grant RGPIN-2020-04382.