Imputation of missing daily rainfall data; A comparison between artificial intelligence and statistical techniques

MethodsX. 2023 Oct 27:11:102459. doi: 10.1016/j.mex.2023.102459. eCollection 2023 Dec.

Abstract

Handling missing values is a critical component of the data processing in hydrological modeling. The key objective of this research is to assess statistical techniques (STs) and artificial intelligence-based techniques (AITs) for imputing missing daily rainfall values and recommend a methodology applicable to the mountainous terrain of northern Thailand. In this study, 30 years of daily rainfall data was collected from 20 rainfall stations in northern Thailand and randomly 25-35 % of data was deleted from four target stations based on Spearman correlation coefficient between the target and neighboring stations. Imputation models were developed on training and testing datasets and statistically evaluated by mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2), and correlation coefficient (r). This study used STs, including arithmetic averaging (AA), multiple linear regression (MLR), normal-ratio (NR), nonlinear iterative partial least squares (NIPALS) algorithm, and linear interpolation was used.•STs results were compared with AITs, including long-short-term-memory recurrent neural network (LSTM-RNN), M5 model tree (M5-MT), multilayer perceptron neural networks (MLPNN), support vector regression with polynomial and radial basis function SVR-poly and SVR-RBF.•The findings revealed that MLR imputation model achieved an average MAE of 0.98, RMSE of 4.52, and R2 was about 79.6 % at all target stations. On the other hand, for the M5-MT model, the average MAE was 0.91, RMSE was about 4.52, and R2 was around 79.8 % compared to other STs and AITs. M5-MT was most prominent among AITs. Notably, the MLR technique stood out as a recommended approach due to its ability to deliver good estimation results while offering a transparent mechanism and not necessitating prior knowledge for model creation.

Keywords: AITs for Imputation missing daily rainfall data; Artificial intelligence; Deep learning; Imputation; Machine learning; Missing data; Neural networks; Rainfall.