Optimizing campus-wide COVID-19 test notifications with interpretable wastewater time-series features using machine learning models

Sci Rep. 2023 Nov 24;13(1):20670. doi: 10.1038/s41598-023-47859-2.

Abstract

During the COVID-19 pandemic, wastewater surveillance of the SARS CoV-2 virus has been demonstrated to be effective for population surveillance at the county level down to the building level. At the University of California, San Diego, daily high-resolution wastewater surveillance conducted at the building level is being used to identify potential undiagnosed infections and trigger notification of residents and responsive testing, but the optimal determinants for notifications are unknown. To fill this gap, we propose a pipeline for data processing and identifying features of a series of wastewater test results that can predict the presence of COVID-19 in residences associated with the test sites. Using time series of wastewater results and individual testing results during periods of routine asymptomatic testing among UCSD students from 11/2020 to 11/2021, we develop hierarchical classification/decision tree models to select the most informative wastewater features (patterns of results) which predict individual infections. We find that the best predictor of positive individual level tests in residence buildings is whether or not the wastewater samples were positive in at least 3 of the past 7 days. We also demonstrate that the tree models outperform a wide range of other statistical and machine models in predicting the individual COVID-19 infections while preserving interpretability. Results of this study have been used to refine campus-wide guidelines and email notification systems to alert residents of potential infections.

MeSH terms

  • COVID-19* / diagnosis
  • COVID-19* / epidemiology
  • Humans
  • Machine Learning
  • Pandemics
  • Time Factors
  • Wastewater
  • Wastewater-Based Epidemiological Monitoring

Substances

  • Wastewater