An approach based on multivariate distribution and Gaussian copulas to predict groundwater quality using DNN models in a data scarce environment

MethodsX. 2023 Feb 2:10:102034. doi: 10.1016/j.mex.2023.102034. eCollection 2023.

Abstract

Machine Learning models have become a fruitful tool in water resources modelling. However, it requires a significant amount of datasets for training and validation, which poses challenges in the analysis of data scarce environments, particularly for poorly monitored basins. In such scenarios, using Virtual Sample Generation (VSG) method is valuable to overcome this challenge in developing ML models. The main aim of this manuscript is to introduce a novel VSG based on multivariate distribution and Gaussian Copula called MVD-VSG whereby appropriate virtual combinations of groundwater quality parameters can be generated to train Deep Neural Network (DNN) for predicting Entropy Weighted Water Quality Index (EWQI) of aquifers even with small datasets. The MVD-VSG is original and was validated for its initial application using sufficient observed datasets collected from two aquifers. The validation results showed that from only 20 original samples, the MVD-VSG provided enough accuracy to predict EWQI with an NSE of 0.87. However the companion publication of this Method paper is El Bilali et al. [1]. •Development of MVD-VSG to generate virtual combinations of groundwater parameters in data scarce environment.•Training deep neural network to predict groundwater quality.•Validation of the method with sufficient observed datasets and sensitivity analysis.

Keywords: An approach based on copulas to predict groundwater quality using DNN models with small data; Deep neural network; Entropy Water quality Index; Groundwater quality; Virtual sample generation.