High-frequency data significantly enhances the prediction ability of point and interval estimation

Sci Total Environ. 2024 Feb 20:912:169289. doi: 10.1016/j.scitotenv.2023.169289. Epub 2023 Dec 21.

Abstract

Accurate prediction of dissolved oxygen (DO) dynamics is crucial for understanding the influence of environmental factors on the stability of aquatic ecosystem. However, limited research has been conducted to determine the optimal frequency of water quality monitoring that ensures continuous assessment of water health while minimizing costs. To address these challenges, the present study developed a hybrid stochastic hydrological model (i.e., ARIMA-GARCH hybrid model) and machine learning (ML) models. The objective of this study is to identify the best-performing model and establish the optimal monitoring frequency. Results revealed that high-frequency DO monitoring data exhibit greater variability compared to low-frequency data. Moreover, the ARIMA-GARCH model demonstrates promising potential in predicting DO concentrations for low-frequency monitoring data, surpassing ML models in performance. Furthermore, increasing the monitoring frequency significantly improves the prediction accuracy of models, regardless of whether point (with lower R2 values of 0.64 and 0.51 for daily detection than these of every 15 min (0.96 and 0.99) at CHQ and LHT, respectively) or interval predictions (with RIW higher values of 2.00 and 1.55 for daily detection higher than these of 0.02 and 0.16 in every 15 min at CHQ and LHT, respectively) are considered. Additionally, a 4 hourly monitoring frequency was found to be optimal for water quality assessment using each model. These findings identify the superior performing of the ARIMA-GARCH model and highlight the crucial role of monitoring frequency in enhancing DO prediction and improving model performance.

Keywords: ARIMA-GARCH model; Different time-scales; Dissolved oxygen; Interval prediction; Machine learning; Point prediction.