Wastewater-based prediction of COVID-19 cases using a random forest algorithm with strain prevalence data: A case study of five municipalities in Latvia

Sci Total Environ. 2023 Sep 15:891:164519. doi: 10.1016/j.scitotenv.2023.164519. Epub 2023 May 31.

Abstract

Wastewater-based epidemiology (WBE) is a rapid and cost-effective method that can detect SARS-CoV-2 genomic components in wastewater and can provide an early warning for possible COVID-19 outbreaks up to one or two weeks in advance. However, the quantitative relationship between the intensity of the epidemic and the possible progression of the pandemic is still unclear, necessitating further research. This study investigates the use of WBE to rapidly monitor the SARS-CoV-2 virus from five municipal wastewater treatment plants in Latvia and forecast cumulative COVID-19 cases two weeks in advance. For this purpose, a real-time quantitative PCR approach was used to monitor the SARS-CoV-2 nucleocapsid 1 (N1), nucleocapsid 2 (N2), and E genes in municipal wastewater. The RNA signals in the wastewater were compared to the reported COVID-19 cases, and the strain prevalence data of the SARS-CoV-2 virus were identified by targeted sequencing of receptor binding domain (RBD) and furin cleavage site (FCS) regions employing next-generation sequencing technology. The model methodology for a linear model and a random forest was designed and carried out to ascertain the correlation between the cumulative cases, strain prevalence data, and RNA concentration in the wastewater to predict the COVID-19 outbreak and its scale. Additionally, the factors that impact the model prediction accuracy for COVID-19 were investigated and compared between linear and random forest models. The results of cross-validated model metrics showed that the random forest model is more effective in predicting the cumulative COVID-19 cases two weeks in advance when strain prevalence data are included. The results from this research help inform WBE and public health recommendations by providing valuable insights into the impact of environmental exposures on health outcomes.

Keywords: Parameter importance; Random forest model; SARS-CoV-2; Wastewater-based epidemiology.

MeSH terms

  • COVID-19* / epidemiology
  • Cities / epidemiology
  • Humans
  • Latvia / epidemiology
  • Prevalence
  • Random Forest
  • SARS-CoV-2
  • Wastewater

Substances

  • Wastewater