Public health and pipe breaks in water distribution systems: analysis with internet search volume as a proxy

Water Res. 2014 Apr 15:53:26-34. doi: 10.1016/j.watres.2014.01.013. Epub 2014 Jan 21.

Abstract

Drinking water distribution infrastructure has been identified as a factor in waterborne disease outbreaks and improved understanding of the public health risks associated with distribution system failures has been identified as a priority area for research. Pipe breaks may pose a risk, as their occurrence and repair can result in low or negative pressure, potentially allowing contamination of drinking water from adjacent soils. However, measuring this phenomenon is challenging because the most likely health impact is mild gastrointestinal (GI) illness, which is unlikely to result in a doctor or hospital visit. Here we present a novel method that uses data mining techniques and internet search volume to assess the relationship between pipe breaks and symptoms of GI illness in two U.S. cities. Weekly search volume for the terms diarrhea and vomiting was used as the response variable with the number of pipe breaks in each city as a covariate as well as additional covariates to control for seasonal patterns, search volume persistence, and other sources of GI illness. The fit and predictive accuracy of multiple regression and data mining techniques were compared, with the best performance obtained using random forest and bagged regression tree models. Pipe breaks were found to be an important and positively correlated predictor of internet search volume in multiple models in both cities, supporting previous investigations that indicated an increased risk of GI illness from distribution system disturbances.

Keywords: Distribution network; Gastrointestinal illness; Non-linear regression; Pipe breaks.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Cities
  • Data Mining*
  • Drinking Water / microbiology*
  • Gastrointestinal Diseases / epidemiology*
  • Gastrointestinal Diseases / etiology
  • Humans
  • Internet*
  • Models, Theoretical
  • Public Health / statistics & numerical data*
  • United States / epidemiology
  • Water Purification
  • Water Supply*

Substances

  • Drinking Water