Proposed formulation of surface water quality and modelling using gene expression, machine learning, and regression techniques

Environ Sci Pollut Res Int. 2021 Mar;28(11):13202-13220. doi: 10.1007/s11356-020-11490-9. Epub 2020 Nov 11.

Abstract

The rising water pollution from anthropogenic factors motivates further research in developing water quality predicting models. The available models have certain limitations due to limited timespan data and the incapability to provide empirical expressions. This study is devoted to model and derive empirical equations for surface water quality of upper Indus river basin using a 30-year dataset with machine learning techniques and then to determine the most reliable model capable to accurately predict river water quality. Total dissolve solids (TDS) and electrical conductivity (EC) were used as dependent variables, whereas eight parameters were used as independent variables with 70 and 30% data for model training and testing, respectively. Various evaluation criteria, i.e., Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE), were used to assess the performance of models. The data is also validated with the help of k-fold cross-validation using R2 and RMSE. The results indicated a strong correlation with NSE and R2 both above 0.85 for all the developed models. Gene expression programming (GEP) outperformed both artificial neural network (ANN) and linear and non-linear regression models for TDS and EC. The sensitivity and parametric analyses revealed that bicarbonate is the most sensitive parameter influencing both TDS and EC models. Two equations were derived and formulated to represent the novel results of GEP model to help authorities in the effective monitoring of river water quality.

Keywords: Machine learning algorithms; Regression; Sensitivity and parametric analyses; Surface water quality; k-fold cross-validation.

MeSH terms

  • Environmental Monitoring*
  • Gene Expression
  • Machine Learning
  • Rivers
  • Water Quality*