Optimized stacking, a new method for constructing ensemble surrogate models applied to DNAPL-contaminated aquifer remediation

J Contam Hydrol. 2021 Dec:243:103914. doi: 10.1016/j.jconhyd.2021.103914. Epub 2021 Oct 28.

Abstract

Surfactant-enhanced aquifer remediation (SEAR) is an appropriate method for DNAPL-contaminated aquifer remediation; However, due to the high cost of the SEAR method, finding the optimal remediation scenario is usually essential. Embedding numerical simulation models of DNAPL remediation within the optimization routines are computationally expensive, and in this situation, using surrogate models instead of numerical models is a proper alternative. Ensemble methods are also utilized to enhance the accuracy of surrogate models, and in this study, the Stacking ensemble method was applied and compared with conventional methods. First, Six machine learning methods were used as surrogate models, and various feature scaling techniques were employed, and their impact on the models' performance was evaluated. Also, Bagging and Boosting homogeneous ensemble methods were used to improve the base models' accuracy. A total of six stand-alone surrogate models and 12 homogeneous ensemble models were used as the base input models of the Stacking ensemble model. Due to the large size of the Stacking model, Bayesian hyper-parameter optimization method was used to find its optimal hyper-parameters. The results showed that the Bayesian hyper-parameter optimization method had better performance than common methods such as random search and grid search. The artificial neural network model, whose input data was scaled by the power transformer method, had the best performance with a cross-validation RMSE of 0.065. The Boosting method increased the base models' accuracy more than other homogeneous methods, and the best Boosting model had a test RMSE of 0.039. The Stacking ensemble method significantly increased the base models' accuracy and performed better than other ensemble methods. The best ensemble surrogate model constructed with Stacking had a cross-validation RMSE of 0.016. Finally, a differential evolution optimization model was used by substituting the Stacking ensemble model with the numerical model, and the optimal remediation strategy was obtained at a total cost of $ 72,706.

Keywords: Bayesian hyper-parameter optimization; DNAPL; Ensemble surrogate model; Feature scaling; Stacking; Surfactant-enhanced aquifer remediation.

MeSH terms

  • Bayes Theorem
  • Computer Simulation
  • Groundwater*
  • Models, Theoretical*
  • Surface-Active Agents

Substances

  • Surface-Active Agents