Identification of organic chemical indicators for tracking pollution sources in groundwater by machine learning from GC-HRMS-based suspect and non-target screening data

Water Res. 2024 Mar 15:252:121130. doi: 10.1016/j.watres.2024.121130. Epub 2024 Jan 13.

Abstract

In this study, the strong analytical power of gas chromatography coupled to a high resolution mass spectrometry (GC-HRMS) in suspect and non-target screening (SNTS) of organic micropollutants was combined with machine learning tools for proposing a novel and robust systematic environmental forensics workflow, focusing on groundwater contamination. Groundwater samples were collected from four different regions with diverse contamination histories (namely oil [OC], agricultural [AGR], industrial [IND], and landfill [LF]), and a total of 252 organic micropollutants were identified, including pharmaceuticals, personal care products, pesticides, polycyclic aromatic hydrocarbons, plasticizers, phenols, organophosphate flame retardants, transformation products, and others, with detection frequencies ranging from 3 % to 100 %. Amongst the SNTS identified compounds, a total of 51 chemical indicators (i.e., OC: 13, LF: 12, AGR: 19, IND: 7) which included level 1 and 2 SNTS identified chemicals were pinpointed across all sampling regions by integrating a bootstrapped feature selection method involving the bootfs algorithm and a partial least squares discriminant analysis (PLS-DA) model to determine potential prevalent contamination sources. The proposed workflow showed good predictive ability (Q2) of 0.897, and the suggested contamination sources were gasoline, diesel, and/or other light petroleum products for the OC region, anthropogenic activities for the LF region, agricultural and human activities for the AGR region, and industrial/human activities for the IND region. These results suggest that the proposed workflow can select a subset of the most diagnostic features in the chemical space that can best distinguish a specific contamination source class.

Keywords: Chemical indicators; Environmental forensics; GC-HRMS; Groundwater; Machine learning; Suspect/non-target screening.

MeSH terms

  • Environmental Monitoring / methods
  • Environmental Pollution / analysis
  • Gas Chromatography-Mass Spectrometry
  • Groundwater* / chemistry
  • Humans
  • Organic Chemicals / analysis
  • Water Pollutants, Chemical* / analysis

Substances

  • Organic Chemicals
  • Water Pollutants, Chemical