Novel machine learning algorithms to predict the groundwater vulnerability index to nitrate pollution at two levels of modeling

Chemosphere. 2023 Feb:314:137671. doi: 10.1016/j.chemosphere.2022.137671. Epub 2022 Dec 28.

Abstract

The accurate mapping and assessment of groundwater vulnerability index are crucial for the preservation of groundwater resources from the possible contamination. In this research, novel intelligent predictive Machine Learning (ML) regression models of k-Neighborhood (KNN), ensemble Extremely Randomized Trees (ERT), and ensemble Bagging regression (BA) at two levels of modeling were utilized to improve DRASTIC-LU model in the Miryang aquifer located in South Korea. The predicted outputs from level 1 (KNN and ERT models) were used as inputs for ensemble bagging (BA) in level 2. The predictive groundwater pollution vulnerability index (GPVI), derived from DRASTIC-LU model was adjusted by NO3-N data and was utilized as the target data of the ML models. Hyperparameters for all models were tuned using a Grid Searching approach to determine the best effective model structures. Various statistical metrics and graphical representations were used to evaluate the superior predictive performance among ML models. Ensemble BA model in level 2 was more precise than standalone KNN and ensemble ERT models in level 1 for predicting GPVI values. Furthermore, the ensemble BA model offered suitable outcomes for the unseen data that could subsequently prevent the overfitting issue in the testing phase. Therefore, ML modeling at two levels could be an excellent approach for the proactive management of groundwater resources against contamination.

Keywords: BA; ERT; GPVI; KNN; Modeling at two levels.

MeSH terms

  • Algorithms
  • Environmental Monitoring
  • Groundwater* / chemistry
  • Nitrates* / analysis
  • Water Pollution / analysis

Substances

  • Nitrates