Optimizing campus-wide COVID-19 test notifications with interpretable wastewater time-series features using machine learning models

Tuo Lin; Smruthi Karthikeyan; Alysson Satterlund; Robert Schooley; Rob Knight; Victor De Gruttola; Natasha Martin; Jingjing Zou

doi:10.1038/s41598-023-47859-2

Optimizing campus-wide COVID-19 test notifications with interpretable wastewater time-series features using machine learning models

Sci Rep. 2023 Nov 24;13(1):20670. doi: 10.1038/s41598-023-47859-2.

Authors

Tuo Lin¹, Smruthi Karthikeyan², Alysson Satterlund³, Robert Schooley⁴, Rob Knight^{5

6

7}, Victor De Gruttola⁸, Natasha Martin⁴, Jingjing Zou⁹

Affiliations

¹ Department of Biostatistics, University of Florida, Gainesville, FL, 32608, USA.
² Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, 91125, USA.
³ Student Affairs, University of California, San Diego, La Jolla, CA, 92093, USA.
⁴ Division of Infectious Diseases and Global Public Health, Department of Medicine, University of California, San Diego, La Jolla, CA, 92093, USA.
⁵ Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
⁶ Department of Computer Science and Engineering, University of California, San Diego, CA, USA.
⁷ Center for Microbiome Innovation, University of California, San Diego, CA, USA.
⁸ Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla, CA, 92093, USA.
⁹ Herbert Wertheim School of Public Health and Human Longevity Science, University of California, San Diego, La Jolla, CA, 92093, USA. j2zou@ucsd.edu.

Abstract

During the COVID-19 pandemic, wastewater surveillance of the SARS CoV-2 virus has been demonstrated to be effective for population surveillance at the county level down to the building level. At the University of California, San Diego, daily high-resolution wastewater surveillance conducted at the building level is being used to identify potential undiagnosed infections and trigger notification of residents and responsive testing, but the optimal determinants for notifications are unknown. To fill this gap, we propose a pipeline for data processing and identifying features of a series of wastewater test results that can predict the presence of COVID-19 in residences associated with the test sites. Using time series of wastewater results and individual testing results during periods of routine asymptomatic testing among UCSD students from 11/2020 to 11/2021, we develop hierarchical classification/decision tree models to select the most informative wastewater features (patterns of results) which predict individual infections. We find that the best predictor of positive individual level tests in residence buildings is whether or not the wastewater samples were positive in at least 3 of the past 7 days. We also demonstrate that the tree models outperform a wide range of other statistical and machine models in predicting the individual COVID-19 infections while preserving interpretability. Results of this study have been used to refine campus-wide guidelines and email notification systems to alert residents of potential infections.

MeSH terms

COVID-19* / diagnosis
COVID-19* / epidemiology
Humans
Machine Learning
Pandemics
Time Factors
Wastewater
Wastewater-Based Epidemiological Monitoring

Substances

Wastewater