A machine learning model for nowcasting epidemic incidence

Math Biosci. 2022 Jan:343:108677. doi: 10.1016/j.mbs.2021.108677. Epub 2021 Nov 27.

Abstract

Due to delay in reporting, the daily national and statewide COVID-19 incidence counts are often unreliable and need to be estimated from recent data. This process is known in economics as nowcasting. We describe in this paper a simple random forest statistical model for nowcasting the COVID-19 daily new infection counts based on historic data along with a set of simple covariates, such as the currently reported infection counts, day of the week, and time since first reporting. We apply the model to adjust the daily infection counts in Ohio, and show that the predictions from this simple data-driven method compare favorably both in quality and computational burden to those obtained from the state-of-the-art hierarchical Bayesian model employing a complex statistical algorithm. The interactive notebook for performing nowcasting is available online at https://tinyurl.com/simpleMLnowcasting.

Keywords: Backfilling; COVID-19 incidence; Nowcasting; Random forest.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bayes Theorem
  • COVID-19*
  • Humans
  • Incidence
  • Machine Learning
  • SARS-CoV-2