A novel logic-based approach for quantitative toxicology prediction

J Chem Inf Model. 2007 May-Jun;47(3):998-1006. doi: 10.1021/ci600223d. Epub 2007 Apr 24.

Abstract

There is a pressing need for accurate in silico methods to predict the toxicity of molecules that are being introduced into the environment or are being developed into new pharmaceuticals. Predictive toxicology is in the realm of structure activity relationships (SAR), and many approaches have been used to derive such SAR. Previous work has shown that inductive logic programming (ILP) is a powerful approach that circumvents several major difficulties, such as molecular superposition, faced by some other SAR methods. The ILP approach reasons with chemical substructures within a relational framework and yields chemically understandable rules. Here, we report a general new approach, support vector inductive logic programming (SVILP), which extends the essentially qualitative ILP-based SAR to quantitative modeling. First, ILP is used to learn rules, the predictions of which are then used within a novel kernel to derive a support-vector generalization model. For a highly heterogeneous dataset of 576 molecules with known fathead minnow fish toxicity, the cross-validated correlation coefficients (R2CV) from a chemical descriptor method (CHEM) and SVILP are 0.52 and 0.66, respectively. The ILP, CHEM, and SVILP approaches correctly predict 55, 58, and 73%, respectively, of toxic molecules. In a set of 165 unseen molecules, the R2 values from the commercial software TOPKAT and SVILP are 0.26 and 0.57, respectively. In all calculations, SVILP showed significant improvements in comparison with the other methods. The SVILP approach has a major advantage in that it uses ILP automatically and consistently to derive rules, mostly novel, describing fragments that are toxicity alerts. The SVILP is a general machine-learning approach and has the potential of tackling many problems relevant to chemoinformatics including in silico drug design.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Artificial Intelligence
  • Combinatorial Chemistry Techniques
  • Drug Design
  • Drug-Related Side Effects and Adverse Reactions*
  • Logic*
  • Molecular Structure
  • Software
  • Structure-Activity Relationship
  • Toxicology / methods*