Machine Learning for Ionization Potentials and Photoionization Cross Sections of Volatile Organic Compounds

ACS Earth Space Chem. 2023 Apr 6;7(4):863-875. doi: 10.1021/acsearthspacechem.3c00009. eCollection 2023 Apr 20.

Abstract

Molecular ionization potentials (IP) and photoionization cross sections (σ) can affect the sensitivity of photoionization detectors (PIDs) and other sensors for gaseous species. This study employs several methods of machine learning (ML) to predict IP and σ values at 10.6 eV (117 nm) for a dataset of 1251 gaseous organic species. The explicitness of the treatment of the species electronic structure progressively increases among the methods. The study compares the ML predictions of the IP and σ values to those obtained by quantum chemical calculations. The ML predictions are comparable in performance to those of the quantum calculations when evaluated against measurements. Pretraining further reduces the mean absolute errors (ε) compared to the measurements. The graph-based attentive fingerprint model was most accurate, for which εIP = 0.23 ± 0.01 eV and εσ = 2.8 ± 0.2 Mb compared to measurements and computed cross sections, respectively. The ML predictions for IP correlate well with both the measured IPs (R 2 = 0.88) and with IPs computed at the level of M06-2X/aug-cc-pVTZ (R 2 = 0.82). The ML predictions for σ correlated reasonably well with computed cross sections (R 2 = 0.66). The developed ML methods for IP and σ values, representing the properties of a generalizable set of volatile organic compounds (VOCs) relevant to industrial applications and atmospheric chemistry, can be used to quantitatively describe the species-dependent sensitivity of chemical sensors that use ionizing radiation as part of the sensing mechanism, such as photoionization detectors.