Effect of environmental covariable selection in the hydrological modeling using machine learning models to predict daily streamflow

J Environ Manage. 2021 Jul 15:290:112625. doi: 10.1016/j.jenvman.2021.112625. Epub 2021 Apr 22.

Abstract

There are different methods for predicting streamflow, and, recently machine learning has been widely used for this purpose. This technique uses a wide set of covariables in the prediction process that must undergo a selection to increase the precision and stability of the models. Thus, this work aimed to analyze the effect of covariable selection with Recursive Feature Elimination (RFE) and Forward Feature Selection (FFS) in the performance of machine learning models to predict daily streamflow. The study was carried out in the Piranga river basin, located in the State of Minas Gerais, Brazil. The database consisted of an 18-year-old historical series (2000-2017) of streamflow data at the outlet of the basin and the covariables derived from the streamflow of affluent rivers, precipitation, land use and land cover, products from the MODIS sensors, and time. The highly correlated covariables were eliminated and the selection of covariables by the level of importance was carried out by the RFE and FFS methods for the Multivariate Adaptive Regression (EARTH), Multiple Linear Regression (MLR), and Random Forest (RF) models. The data were partitioned into two groups: 75% for training and 25% for validation. The models were run 50 times and had their performance evaluated by the Nash Sutcliffe efficiency coefficient (NSE), Determination coefficient (R2), and Root of Mean Square Error (RMSE). The three models tested showed satisfactory performance with both covariable selection methods, however, all of them proved to be inaccurate for predicting values associated with maximum streamflow events. The use of FFS, in most cases, improved the performance of the models and reduced the number of selected covariables. The use of machine learning to predict daily streamflow proved to be efficient and the use of FFS in the selection of covariables enhanced this efficiency.

Keywords: Environmental covariables; Hydrological modeling; Supervised learning.

MeSH terms

  • Brazil
  • Hydrology*
  • Linear Models
  • Machine Learning
  • Rivers*