Forecasting of stage-discharge in a non-perennial river using machine learning with gamma test

Heliyon. 2023 May 13;9(5):e16290. doi: 10.1016/j.heliyon.2023.e16290. eCollection 2023 May.

Abstract

Knowledge of the stage-discharge rating curve is useful in designing and planning flood warnings; thus, developing a reliable stage-discharge rating curve is a fundamental and crucial component of water resource system engineering. Since the continuous measurement is often impossible, the stage-discharge relationship is generally used in natural streams to estimate discharge. This paper aims to optimize the rating curve using a generalized reduced gradient (GRG) solver and the test the accuracy and applicability of the hybridized linear regression (LR) with other machine learning techniques, namely, linear regression-random subspace (LR-RSS), linear regression-reduced error pruning tree (LR-REPTree), linear regression-support vector machine (LR-SVM) and linear regression-M5 pruned (LR-M5P) models. An application of these hybrid models was performed and test to modeling the Gaula Barrage stage-discharge problem. For this, 12-year historical stage-discharge data were collected and analyzed. The 12-year historical daily flow data (m3/s) and stage (m) from during the monsoon season, i.e., June to October only from 03/06/2007 to 31/10/2018, were used for discharge simulation. The best suitable combination of input variables for LR, LR-RSS, LR-REPTree, LR-SVM, and LR-M5P models was identified and decided using the gamma test. GRG-based rating curve equations were found to be as effective and more accurate as conventional rating curve equations. The outcomes from GRG, LR, LR-RSS, LR-REPTree, LR-SVM, and LR-M5P models were compared to observed values of daily discharge based on Nash Sutcliffe model efficiency coefficient (NSE), Willmott Index of Agreement (d), Kling-Gupta efficiency (KGE), mean absolute error (MAE), mean bias error (MBE), relative bias in percent (RE), root mean square error (RMSE) Pearson correlation coefficient (PCC) and coefficient of determination (R2). The LR-REPTree model (combination 1: NSE = 0.993, d = 0.998, KGE = 0.987, PCC(r) = 0.997, and R2 = 0.994 and minimum value of RMSE = 0.109, MAE = 0.041, MBE = -0.010 and RE = -0.1%; combination 2; NSE = 0.941, d = 0.984, KGE = 0. 923, PCC(r) = 0. 973, and R2 = 0. 947 and minimum value of RMSE = 0. 331, MAE = 0.143, MBE = -0.089 and RE = -0.9%) performed superior to the GRG, LR, LR-RSS, LR-SVM, and LR-M5P models in all input combinations during the testing period. It was also noticed that the performance of the alone LR and its hybrid models (i.e., LR-RSS, LR-REPTree, LR-SVM, and LR-M5P) was better than the conventional stage-discharge rating curve, including the GRG method.

Keywords: GRG technique; Logistic regression; Machine learning; Rating curve; Stage-discharge forecasting.