Evaluation of machine learning algorithms for groundwater quality modeling

Environ Sci Pollut Res Int. 2023 Apr;30(16):46004-46021. doi: 10.1007/s11356-023-25596-3. Epub 2023 Jan 30.

Abstract

Groundwater quality is typically measured through water sampling and lab analysis. The field-based measurements are costly and time-consuming when applied over a large domain. In this study, we developed a machine learning-based framework to map groundwater quality in an unconfined aquifer in the north of Iran. Groundwater samples were provided from 248 monitoring wells across the region. The groundwater quality index (GWQI) in each well was measured and classified into four classes: very poor, poor, good, and excellent, according to their cut-off values. Factors affecting groundwater quality, including distance to industrial centers, distance to residential areas, population density, aquifer transmissivity, precipitation, evaporation, geology, and elevation, were identified and prepared in the GIS environment. Six machine learning classifiers, including extreme gradient boosting (XGB), random forest (RF), support vector machine (SVM), artificial neural networks (ANN), k-nearest neighbor (KNN), and Gaussian classifier model (GCM), were used to establish relationships between GWQI and its controlling factors. The algorithms were evaluated using the receiver operating characteristic curve (ROC) and statistical efficiencies (overall accuracy, precision, recall, and F-1 score). Accuracy assessment showed that ML algorithms provided high accuracy in predicting groundwater quality. However, RF was selected as the optimum model given its higher accuracy (overall accuracy, precision, and recall = 0.92; ROC = 0.95). The trained RF model was used to map GWQI classes across the entire region. Results showed that the poor GWQI class is dominant in the study area (covering 66% of the study area), followed by good (19% of the area), very poor (14% of the area), and excellent (< 1% of the area) classes. An area of very poor GWQI was observed in the north. Feature analysis indicated that the distance to industrial locations is the main factor affecting groundwater quality in the region. The study provides a cost-effective methodology in groundwater quality modeling that can be duplicated in other regions with similar hydrological and geological settings.

Keywords: Classification algorithms; GIS; GWQI; Groundwater quality map; Machine learning.

MeSH terms

  • Algorithms
  • Environmental Monitoring* / methods
  • Groundwater* / analysis
  • Machine Learning
  • Neural Networks, Computer