Developing stacking ensemble models for multivariate contamination detection in water distribution systems

Sci Total Environ. 2022 Jul 1:828:154284. doi: 10.1016/j.scitotenv.2022.154284. Epub 2022 Mar 2.

Abstract

This study presents a new stacking ensemble model for contamination event detection using multiple water quality parameters. The stacking model consists of a number of machine learning base predictors and a meta-predictor, and it is trained using cross-validation to capture different features in multiple water quality parameters and then used for water quality predictions. For each water quality parameter, the residuals between predicted and measured data are classified to identify anomalies with thresholds derived from the sequential model-based optimization method and detection probabilities updated using Bayesian analysis. Alarms derived from individual water quality parameters are fused to enhance the anomaly signals and improve the detection accuracy. The proposed stacking-based method is evaluated using a data set of six water quality parameters from a real water distribution system with randomly simulated events. The stacking-based method could detect 2496 events out of a total 2500 events without a false alarm. The results show that the stacking method outperforms an artificial neural network (ANN) benchmark method in contamination event detection. The stacking method has a higher true positive rate, lower false positive rate and higher F1 score than the ANN method. This implies that the stacking method has great promise of detecting contamination events in the water distribution system.

Keywords: Contamination detection; Ensemble modeling; Machine learning; Stacking modeling; Water distribution system; Water quality.

MeSH terms

  • Bayes Theorem
  • Machine Learning
  • Neural Networks, Computer*
  • Probability
  • Water Quality*