Contaminant source identification using semi-supervised machine learning

J Contam Hydrol. 2018 May:212:134-142. doi: 10.1016/j.jconhyd.2017.11.002. Epub 2017 Nov 8.

Abstract

Identification of the original groundwater types present in geochemical mixtures observed in an aquifer is a challenging but very important task. Frequently, some of the groundwater types are related to different infiltration and/or contamination sources associated with various geochemical signatures and origins. The characterization of groundwater mixing processes typically requires solving complex inverse models representing groundwater flow and geochemical transport in the aquifer, where the inverse analysis accounts for available site data. Usually, the model is calibrated against the available data characterizing the spatial and temporal distribution of the observed geochemical types. Numerous different geochemical constituents and processes may need to be simulated in these models which further complicates the analyses. In this paper, we propose a new contaminant source identification approach that performs decomposition of the observation mixtures based on Non-negative Matrix Factorization (NMF) method for Blind Source Separation (BSS), coupled with a custom semi-supervised clustering algorithm. Our methodology, called NMFk, is capable of identifying (a) the unknown number of groundwater types and (b) the original geochemical concentration of the contaminant sources from measured geochemical mixtures with unknown mixing ratios without any additional site information. NMFk is tested on synthetic and real-world site data. The NMFk algorithm works with geochemical data represented in the form of concentrations, ratios (of two constituents; for example, isotope ratios), and delta notations (standard normalized stable isotope ratios).

Keywords: Advection-diffusion transport; Blind Source Separation; Feature extraction; Geochemical signatures; Groundwater contamination; Non-negative Matrix Factorization; Robustness analysis; Semi-supervised learning; Source identification.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Environmental Monitoring / methods
  • Groundwater / chemistry*
  • Isotopes / analysis
  • Supervised Machine Learning*
  • Water Pollutants, Chemical / chemistry*

Substances

  • Isotopes
  • Water Pollutants, Chemical