A new Bayesian approach for managing bathing water quality at river bathing locations vulnerable to short-term pollution

Water Res. 2024 Mar 15:252:121186. doi: 10.1016/j.watres.2024.121186. Epub 2024 Jan 25.

Abstract

Short-term fecal pollution events are a major challenge for managing microbial safety at recreational waters. Long turn-over times of current laboratory methods for analyzing fecal indicator bacteria (FIB) delay water quality assessments. Data-driven models have been shown to be valuable approaches to enable fast water quality assessments. However, a major barrier towards the wider use of such models is the prevalent data scarcity at existing bathing waters, which questions the representativeness and thus usefulness of such datasets for model training. The present study explores the ability of five data-driven modelling approaches to predict short-term fecal pollution episodes at recreational bathing locations under data scarce situations and imbalanced datasets. The study explicitly focuses on the potential benefits of adopting an innovative modeling and risk-based assessment approach, based on state/cluster-based Bayesian updating of FIB distributions in relation to different hydrological states. The models are benchmarked against commonly applied supervised learning approaches, particularly linear regression, and random forests, as well as to a zero-model which closely resembles the current way of classifying bathing water quality in the European Union. For model-based clustering we apply a non-parametric Bayesian approach based on a Dirichlet Process Mixture Model. The study tests and demonstrates the proposed approaches at three river bathing locations in Germany, known to be influenced by short-term pollution events. At each river two modelling experiments ("longest dry period", "sequential model training") are performed to explore how the different modelling approaches react and adapt to scarce and uninformative training data, i.e., datasets that do not include event pollution information in terms of elevated FIB concentrations. We demonstrate that it is especially the proposed Bayesian approaches that are able to raise correct warnings in such situations (> 90 % true positive rate). The zero-model and random forest are shown to be unable to predict contamination episodes if pollution episodes are not present in the training data. Our research shows that the investigated Bayesian approaches reduce the risk of missed pollution events, thereby improving bathing water safety management. Additionally, the approaches provide a transparent solution for setting minimum data quality requirements under various conditions. The proposed approaches open the way for developing data-driven models for bathing water quality prediction against the reality that data scarcity is common problem at existing and prospective bathing waters.

Keywords: Dirichlet Process Mixture Model; Probabilistic modelling; Recreational waters.

MeSH terms

  • Bacteria
  • Bathing Beaches
  • Bayes Theorem
  • Environmental Monitoring / methods
  • Feces / microbiology
  • Prospective Studies
  • Rivers* / microbiology
  • Water Microbiology
  • Water Pollution
  • Water Quality*