Water Quality Prediction Based on Machine Learning and Comprehensive Weighting Methods

Entropy (Basel). 2023 Aug 9;25(8):1186. doi: 10.3390/e25081186.

Abstract

In the context of escalating global environmental concerns, the importance of preserving water resources and upholding ecological equilibrium has become increasingly apparent. As a result, the monitoring and prediction of water quality have emerged as vital tasks in achieving these objectives. However, ensuring the accuracy and dependability of water quality prediction has proven to be a challenging endeavor. To address this issue, this study proposes a comprehensive weight-based approach that combines entropy weighting with the Pearson correlation coefficient to select crucial features in water quality prediction. This approach effectively considers both feature correlation and information content, avoiding excessive reliance on a single criterion for feature selection. Through the utilization of this comprehensive approach, a comprehensive evaluation of the contribution and importance of the features was achieved, thereby minimizing subjective bias and uncertainty. By striking a balance among various factors, features with stronger correlation and greater information content can be selected, leading to improved accuracy and robustness in the feature-selection process. Furthermore, this study explored several machine learning models for water quality prediction, including Support Vector Machines (SVMs), Multilayer Perceptron (MLP), Random Forest (RF), XGBoost, and Long Short-Term Memory (LSTM). SVM exhibited commendable performance in predicting Dissolved Oxygen (DO), showcasing excellent generalization capabilities and high prediction accuracy. MLP demonstrated its strength in nonlinear modeling and performed well in predicting multiple water quality parameters. Conversely, the RF and XGBoost models exhibited relatively inferior performance in water quality prediction. In contrast, the LSTM model, a recurrent neural network specialized in processing time series data, demonstrated exceptional abilities in water quality prediction. It effectively captured the dynamic patterns present in time series data, offering stable and accurate predictions for various water quality parameters.

Keywords: LSTM; comprehensive weight-based approach; feature selection; machine learning; water quality prediction.