Identification Framework of Contaminant Spill in Rivers Using Machine Learning with Breakthrough Curve Analysis

Int J Environ Res Public Health. 2021 Jan 24;18(3):1023. doi: 10.3390/ijerph18031023.

Abstract

To minimize the damage from contaminant accidents in rivers, early identification of the contaminant source is crucial. Thus, in this study, a framework combining Machine Learning (ML) and the Transient Storage zone Model (TSM) was developed to predict the spill location and mass of a contaminant source. The TSM model was employed to simulate non-Fickian Breakthrough Curves (BTCs), which entails relevant information of the contaminant source. Then, the ML models were used to identify the BTC features, characterized by 21 variables, to predict the spill location and mass. The proposed framework was applied to the Gam Creek, South Korea, in which two tracer tests were conducted. In this study, six ML methods were applied for the prediction of spill location and mass, while the most relevant BTC features were selected by Recursive Feature Elimination Cross-Validation (RFECV). Model applications to field data showed that the ensemble Decision tree models, Random Forest (RF) and Xgboost (XGB), were the most efficient and feasible in predicting the contaminant source.

Keywords: breakthrough curve analysis; contaminant source identification; ensemble decision tree model; recursive feature elimination cross-validation; tracer test; transient storage zone model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning*
  • Republic of Korea
  • Risk Assessment
  • Rivers*