Predicting the acute ecotoxicity of chemical substances by machine learning using graph theory

Chemosphere. 2020 Jan:238:124604. doi: 10.1016/j.chemosphere.2019.124604. Epub 2019 Aug 16.

Abstract

Accurate in silico predictions of chemical substance ecotoxicity has become an important issue in recent years. Most conventional methods, such as the Ecological Structure-Activity Relationship (ECOSAR) model, cluster chemical substances empirically based on structural information and then predict toxicity by employing a log P linear regression model. Due to empirical classification, the prediction accuracy does not improve even if new ecotoxicity test data are added. In addition, most of the conventional methods are not appropriate for predicting the ecotoxicity on inorganic and/or ionized compounds. Furthermore, a user faces difficulty in handling multiple Quantitative Structure-Activity Relationship (QSAR) formulas with one chemical substance. To overcome the flaws of the conventional methods, in this study a new method was developed that applied unsupervised machine learning and graph theory to predict acute ecotoxicity. The proposed machine learning technique is based on the large AIST-MeRAM ecotoxicity test dataset, a software program developed by the National Institute of Advanced Industry Science and Technology for Multi-purpose Ecological Risk Assessment and Management, and the Molecular ACCess System (MACCS) keys that vectorize a chemical structure to 166-bit binary information. The acute toxicity of fish, daphnids, and algae can be predicted with good accuracy, without requiring log P and linear regression models in existing methods. Results from the new method were cross-validated and compared with ECOSAR predictions and show that the new method provides better accuracy for a wider range of chemical substances, including inorganic and ionized compounds.

Keywords: AIST-MeRAM; Chemical substance clustering; ECOSAR; Ecotoxicity prediction; Graph theory; Machine learning.

MeSH terms

  • Animals
  • Computer Simulation
  • Daphnia / drug effects
  • Ecotoxicology / methods*
  • Fishes
  • Linear Models
  • Machine Learning*
  • Models, Theoretical
  • Quantitative Structure-Activity Relationship
  • Risk Assessment / methods
  • Software
  • Water Pollutants, Chemical / chemistry
  • Water Pollutants, Chemical / toxicity*
  • Water Pollution, Chemical / analysis*

Substances

  • Water Pollutants, Chemical