Development of Predictive Models for "Very Poor" Beach Water Quality Gradings Using Class-Imbalance Learning

Environ Sci Technol. 2021 Nov 2;55(21):14990-15000. doi: 10.1021/acs.est.1c03350. Epub 2021 Oct 11.

Abstract

Statistical water quality forecast models are useful tools to assist with beach management. In particular, multiple linear regression (MLR) models have been successfully developed for prediction of fecal indicator bacteria concentrations for beaches in river, lake, and marine environments. Nevertheless, an unresolved challenging issue is the reliable prediction of infrequent events of high bacterial concentrations to inform beach closure decisions to protect public health. The number of field data available for the infrequent events is typically an order of magnitude less than that for days when the water quality criterion is met-MLR models often perform poorly in predicting bacterial concentrations on days when the beaches should be closed. For beach management in Hong Kong, MLR models have been developed to predict beach water quality indices in terms of four gradings (BWQI-1 to 4) based on Escherichia coli (E. coli) concentrations. In this study, we propose an artificial intelligence (AI)-based binary classification (EasyEnsemble) model using class-imbalance learning to predict "very poor" occasions (BWQI-4)-when E. coli concentration exceeds 610 counts/100 mL. Models are developed for three marine beaches with different hydrographic and pollution characteristics using a 30 year data set spanning three periods with different water quality status. The model-data comparison over a wide range of conditions shows that the proposed method results in a significant improvement in the prediction of "very poor" water quality. The proposed class-imbalance method for predicting rare events has an F-score of 0.84, and it significantly outperforms MLR and classification tree (CT) models with corresponding F-scores of 0.39 and 0.69. A robust beach water quality forecast system can hence be developed using hybrid MLR-binary classification modeling.

Keywords: EasyEnsemble; artificial intelligence; beach management; class-imbalance learning; classification trees; coastal beach; multiple linear regression (MLR) models; public health; statistical models; water quality prediction.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence
  • Bathing Beaches*
  • Escherichia coli
  • Water Microbiology
  • Water Quality*