Extracting Structural Information from Physicochemical Property Measurements Using Machine Learning─A New Approach for Structure Elucidation in Non-targeted Analysis

Environ Sci Technol. 2023 Oct 10;57(40):14827-14838. doi: 10.1021/acs.est.3c03003. Epub 2023 Sep 25.

Abstract

Non-targeted analysis (NTA) has made critical contributions in the fields of environmental chemistry and environmental health. One critical bottleneck is the lack of available analytical standards for most chemicals in the environment. Our study aims to explore a novel approach that integrates measurements of equilibrium partition ratios between organic solvents and water (KSW) to predictions of molecular structures. These properties can be used as a fingerprint, which with the help of a machine learning algorithm can be converted into a series of functional groups (RDKit fragments), which can be used to search chemical databases. We conducted partitioning experiments using a chemical mixture containing 185 chemicals in 10 different organic solvents and water. Both a liquid chromatography quadrupole time-of-flight mass spectrometer (LC-QTOF MS) and a LC-Orbitrap MS were used to assess the feasibility of the experimental method and the accuracy of the algorithm at predicting the correct functional groups. The two methods showed differences in log KSW with the QTOF method showing a mean absolute error (MAE) of 0.22 and the Orbitrap method 0.33. The differences also culminated into errors in the predictions of RDKit fragments with the MAE for the QTOF method being 0.23 and for the Orbitrap method being 0.31. Our approach presents a new angle in structure elucidation for NTA and showed promise in assisting with compound identification.

Keywords: machine learning; non-targeted analysis; physicochemical properties; structure elucidation.

MeSH terms

  • Chromatography, Liquid / methods
  • Mass Spectrometry / methods
  • Solvents
  • Water*

Substances

  • Solvents
  • Water