Predicting Primary Biodegradation of Petroleum Hydrocarbons in Aquatic Systems: Integrating System and Molecular Structure Parameters using a Novel Machine-Learning Framework

Environ Toxicol Chem. 2022 Jun;41(6):1359-1369. doi: 10.1002/etc.5328. Epub 2022 Apr 29.

Abstract

Quantitative structure-property relationship (QSPR) models for predicting primary biodegradation of petroleum hydrocarbons have been previously developed. These models use experimental data generated under widely varied conditions, the effects of which are not captured adequately within model formalisms. As a result, they exhibit variable predictive performance and are unable to incorporate the role of study design and test conditions on the assessment of environmental persistence. To address these limitations, a novel machine-learning System-Integrated Model (HC-BioSIM) is presented, which integrates chemical structure and test system variability, leading to improved prediction of primary disappearance time (DT50) values for petroleum hydrocarbons in fresh and marine water. An expanded, highly curated database of 728 experimental DT50 values (181 unique hydrocarbon structures compiled from 13 primary sources) was used to develop and validate a supervised model tree machine-learning model. Using relatively few parameters (6 system and 25 structural parameters), the model demonstrated significant improvement in predictive performance (root mean square error = 0.26, R2 = 0.67) over existing QSPR models. The model also demonstrated improved accuracy of persistence (P) categorization (i.e., "Not P/P/vP"), with an accuracy of 96.8%, and false-positive and -negative categorization rates of 0.4% and 2.7%, respectively. This significant improvement in DT50 prediction, and subsequent persistence categorization, validates the need for models that integrate experimental design and environmental system parameters into biodegradation and persistence assessment. Environ Toxicol Chem 2022;41:1359-1369. © 2022 ExxonMobil Biomedical Sciences, Inc. Environmental Toxicology and Chemistry published by Wiley Periodicals LLC on behalf of SETAC.

Keywords: Biodegradation; Environmental modeling; Hydrocarbon; Machine learning; Persistent compounds; Quantitative structure-property relationship.

MeSH terms

  • Biodegradation, Environmental
  • Hydrocarbons / chemistry
  • Machine Learning
  • Molecular Structure
  • Petroleum* / metabolism

Substances

  • Hydrocarbons
  • Petroleum