Developing early warning systems to predict water lead levels in tap water for private systems

Water Res. 2022 Aug 1:221:118787. doi: 10.1016/j.watres.2022.118787. Epub 2022 Jun 22.

Abstract

Lead is a chemical contaminant that threatens public health, and high levels of lead have been identified in drinking water at locations across the globe. Under-served populations that use private systems for drinking water supplies may be at an elevated level of risk because utilities and governing agencies are not responsible for ensuring that lead levels meet the Lead and Copper Rule at these systems. Predictive models that can be used by residents to assess water quality threats in their households can create awareness of water lead levels (WLLs). This research explores and compares the use of statistical models (i.e., Bayesian Belief classifiers) and machine learning models (i.e., ensemble of decision trees) for predicting WLLs. Models are developed using a dataset collected by the Virginia Household Water Quality Program (VAHWQP) at approximately 8000 households in Virginia during 2012-2017. The dataset reports laboratory-tested water quality parameters at households, location information, and household and plumbing characteristics, including observations of water odor, taste, discoloration. Some water quality parameters, such as pH, iron, and copper, can be measured at low resolution by residents using at-home water test kits and can be used to predict risk of WLLs. The use of at-home water quality test kits was simulated through the discretization of water quality parameter measurements to match the resolution of at-home water quality test kits and the introduction of error in water quality readings. Using this approach, this research demonstrates that low-resolution data collected by residents can be used as input for models to estimate WLLs. Model predictability was explored for a set of at-home water quality test kits that observe a variety of water quality parameters and report parameters at a range of resolutions. The effects of the timing of water sampling (e.g., first-draw vs. flushed samples) and error in kits on model error were tested through simulations. The prediction models developed through this research provide a set of tools for private well users to assess the risk of lead contamination. Models can be implemented as early warning systems in citizen science and online platforms to improve awareness of drinking water threats.

Keywords: Bayesian belief network; Classification; Ensemble of decision trees; Lead in drinking water; Private systems; Water quality; Well water.

MeSH terms

  • Bayes Theorem
  • Copper
  • Drinking Water*
  • Lead / analysis
  • Water Pollutants, Chemical* / analysis
  • Water Quality
  • Water Supply

Substances

  • Drinking Water
  • Water Pollutants, Chemical
  • Lead
  • Copper