Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods

Sci Total Environ. 2019 Sep 20:684:31-49. doi: 10.1016/j.scitotenv.2019.05.312. Epub 2019 May 21.

Abstract

Water scarcity in many regions of the world has become an unpleasant reality. Groundwater appears to be one of the main natural resources capable to reverse this situation. Uncovering the spatial patterns of groundwater occurrence is a crucial factor that could assist in carrying out successful water resources management projects. The main objective of the current study was to provide a novel methodology approach which utilized Genetic Algorithm (GA) in order to perform a feature selection procedure and data mining methods for generating a groundwater spring potential map. Three data mining methods, Naïve Bayes (NB), Support Vector Machine (SVM) and Random Forest (RF) were utilized to construct a groundwater spring potential map that had over 0.81 probability of occurrence for the Wuqi County, Shaanxi Province, China. Groundwater spring locations and sixteen related variables were analyzed, namely: lithology, soil cover, land use cover, normalized difference vegetation index (NDVI), elevation, slope angle, aspect, planform curvature, profile curvature, curvature, stream power index (SPI), stream transport index (STI), topographic wetness index (TWI), mean annual rainfall, distance from river network and distance from road network. The Frequency ratio method was used to weight the variables, whereas a multi-collinearity analysis was performed to identify the relation between the parameters and to decide about their usage. The optimal set of parameters, which was determined by the GA, reduced the number of parameters into twelve removing planform curvature, profile curvature, curvature and STI. The Receiver Operating Characteristic curve and the area under the curve (AUROC) were estimated so as to evaluate the predictive power of each model. The results indicated that the optimized models were superior in accuracy than the original models. The optimized RF model produced the best results (0.9572), followed by the optimized SVM (0.9529) and the optimized NB (0.8235). Overall, the current study highlights the necessity of applying feature selection techniques in groundwater spring assessments and also that data mining methods may be a highly powerful investigation approach for groundwater spring potential mapping.

Keywords: China; Genetic algorithm; Groundwater spring potential mapping; Naïve Bayes; Random Forest; Support Vector Machine.