Water quality index modeling using random forest and improved SMO algorithm for support vector machine in Saf-Saf river basin

Environ Sci Pollut Res Int. 2022 Jul;29(32):48491-48508. doi: 10.1007/s11356-022-18644-x. Epub 2022 Feb 22.

Abstract

The water quality index is one of the prominent general indicators to assess and classify surface water quality, which plays a critical role in river water resources practices. This research constructs a hybrid artificial intelligence model namely sequential minimal optimization-support vector machine (SMO-SVM) along with random forest (RF) as a benchmark model for predicting water quality values at the Wadi Saf-Saf river basin in Algeria. The fifteen input water quality datasets such as biochemical oxygen demand (BOD), oxygen saturation (OS), the potential for hydrogen (pH), chemical oxygen demand (COD), chloride (Cl-), dissolved oxygen (DO), electrical conductivity (EC), total dissolved solids (TDS), nitrate-nitrogen (NO3-N), nitrite-nitrogen (NO2-N), phosphate (PO43-), ammonium (NH4+), temperature (T), turbidity (NTU), and suspended solids (SS) were employed for constructing the predictive models. Different input data combinations are evaluated in terms of predictive performance, using a set of statistical metrics and graphical representation. Results show that less than 40% of samples were observed to be poor quality water during the dry season in downstream northeastern part of the basin. The findings also show that the RF model mostly generates more precise water quality index predictions than the SMO-SVM model for both training and testing stages. Although thirteen input parameters attain the optimal predictive performance (R2 testing = 0.82, RMSE testing = 5.17), a couple of five input parameters, e.g., only pH, EC, TDS, T, and saturation, gives the second optimal predictive precision (R2 test = 0.81, RMSE testing = 5.55). The sensitivity analysis results indicate a greater sensitivity by the all input variables chosen except NO2- of the predictive outcomes to the earlier influencing water quality parameters. Overall, the RF model reveals an improvement on earlier tools for predicting water quality index, according to predictive performance and reducing in the number of input variables.

Keywords: Improved support vector machine; Random forest; Sensitivity analysis; Sequential minimal optimization; Water quality.

MeSH terms

  • Artificial Intelligence
  • Environmental Monitoring / methods
  • Nitrogen / analysis
  • Nitrogen Dioxide / analysis
  • Oxygen / analysis
  • Support Vector Machine
  • Water Pollutants, Chemical* / analysis
  • Water Quality*

Substances

  • Water Pollutants, Chemical
  • Nitrogen
  • Nitrogen Dioxide
  • Oxygen