Concentration estimation of dissolved oxygen in Pearl River Basin using input variable selection and machine learning techniques

Sci Total Environ. 2020 Aug 20:731:139099. doi: 10.1016/j.scitotenv.2020.139099. Epub 2020 May 4.

Abstract

Dissolved oxygen (DO) concentration is an essential index for water environment assessment. Here, we present a modeling approach to estimate DO concentrations using input variable selection and data-driven models. Specifically, the input variable selection technique, the maximal information coefficient (MIC), was used to identify and screen the primary environmental factors driving variation in DO. The data-driven model, support vector regression (SVR), was then used to construct a robust model to estimate DO concentration. The approach was illustrated through a case study of the Pearl River Basin in China. We show that the MIC technique can effectively screen major local environmental factors affecting DO concentrations. MIC value tended to stabilize when the sample size >3000 and EC had the highest score with an MIC >0.3 at both of the stations. The variable-reduced datasets improved the performance of the SVR model by a reduction of 28.65% in RMSE, and increase of 22.16%, 56.27% in R2, NSE, respectively, relative to complete candidate sets. The MIC-SVR model constructed at the tidal river network performed better than nontidal river network by a reduction of approximately 63.01% in RMSE, an increase of 62.36% in NSE, and R2 >0.9. Overall, the proposed technique was able to handle nonlinearity among environmental factors and accurately estimate DO concentrations in tidal river network regions.

Keywords: DO; Maximal information coefficient; Sample size; Support vector regression; Temporal resolution.