A general approach for developing system-specific functions to score protein-ligand docked complexes using support vector inductive logic programming

Proteins. 2007 Dec 1;69(4):823-31. doi: 10.1002/prot.21782.

Abstract

Despite the increased recent use of protein-ligand and protein-protein docking in the drug discovery process due to the increases in computational power, the difficulty of accurately ranking the binding affinities of a series of ligands or a series of proteins docked to a protein receptor remains largely unsolved. This problem is of major concern in lead optimization procedures and has lead to the development of scoring functions tailored to rank the binding affinities of a series of ligands to a specific system. However, such methods can take a long time to develop and their transferability to other systems remains open to question. Here we demonstrate that given a suitable amount of background information a new approach using support vector inductive logic programming (SVILP) can be used to produce system-specific scoring functions. Inductive logic programming (ILP) learns logic-based rules for a given dataset that can be used to describe properties of each member of the set in a qualitative manner. By combining ILP with support vector machine regression, a quantitative set of rules can be obtained. SVILP has previously been used in a biological context to examine datasets containing a series of singular molecular structures and properties. Here we describe the use of SVILP to produce binding affinity predictions of a series of ligands to a particular protein. We also for the first time examine the applicability of SVILP techniques to datasets consisting of protein-ligand complexes. Our results show that SVILP performs comparably with other state-of-the-art methods on five protein-ligand systems as judged by similar cross-validated squares of their correlation coefficients. A McNemar test comparing SVILP to CoMFA and CoMSIA across the five systems indicates our method to be significantly better on one occasion. The ability to graphically display and understand the SVILP-produced rules is demonstrated and this feature of ILP can be used to derive hypothesis for future ligand design in lead optimization procedures. The approach can readily be extended to evaluate the binding affinities of a series of protein-protein complexes.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Computer Simulation*
  • Crystallography, X-Ray / methods
  • Databases, Protein
  • Genomics
  • Ligands
  • Molecular Conformation
  • Protein Binding
  • Protein Conformation
  • Protein Interaction Mapping*
  • Proteins / chemistry*
  • Proteomics / methods*
  • Software

Substances

  • Ligands
  • Proteins