Simulating ensembles of source water quality using a K-nearest neighbor resampling approach

Environ Sci Technol. 2009 Mar 1;43(5):1407-11. doi: 10.1021/es8021182.

Abstract

Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alkalies / chemistry
  • Bromides / analysis
  • Computer Simulation*
  • Models, Chemical*
  • New Jersey
  • Organic Chemicals / chemistry
  • Pliability
  • Rivers / chemistry
  • Time Factors
  • Water / standards*

Substances

  • Alkalies
  • Bromides
  • Organic Chemicals
  • Water