Investigating machine learning models in predicting lake water quality parameters as a 3-year moving average

Environ Sci Pollut Res Int. 2023 May;30(23):63839-63863. doi: 10.1007/s11356-023-26830-8. Epub 2023 Apr 14.

Abstract

Lake water quality plays a vital role in the lake ecosystem, including biotic (for living creatures, such as plants, animals, and micro-organisms) and abiotic interactions. In this research, various types of machine learning (ML) methodologies, such as classification and regression tree (CART), chi-squared automatic interaction detector (CHAID), C5 tree, quick, unbiased, and efficient statistical tree (QUEST), along with multilayer perceptron (MLP) neural network, and radial basis function (RBF) neural network, are employed to predict the concentration of water quality parameters (P, EC, TDS, pH, DO, NH3, SO4, and θ). Lake Erie is situated at the international border of the USA and Canada. The C5 tree and QUEST tree are used to classify data and predict the number of groups, while the other methods are used to predict the concentration of water quality parameters in the form of a 3-year moving average. The greater matching between the observed and predicted data of dissolved oxygen (NSE = 0.978, bias = 0.126) shows that the CART decision tree has higher accuracy in correctly detecting the concentration of this parameter. The C5 tree could identify 33 groups correctly out of 36 total groups, which shows better accuracy for the C5 tree in classifying the data for this parameter.

Keywords: Data mining; Decision tree; Hydro chemical parameters; Machine learning; Neural network; Water quality.

MeSH terms

  • Animals
  • Ecosystem
  • Lakes*
  • Machine Learning
  • Neural Networks, Computer
  • Water Quality*