Knodle: A Support Vector Machines-Based Automatic Perception of Organic Molecules from 3D Coordinates

J Chem Inf Model. 2016 Aug 22;56(8):1410-9. doi: 10.1021/acs.jcim.5b00512. Epub 2016 Jul 21.

Abstract

Here we address the problem of the assignment of atom types and bond orders in low molecular weight compounds. For this purpose, we have developed a prediction model based on nonlinear Support Vector Machines (SVM), implemented in a KNOwledge-Driven Ligand Extractor called Knodle, a software library for the recognition of atomic types, hybridization states, and bond orders in the structures of small molecules. We trained the model using an excessive amount of structural data collected from the PDBbindCN database. Accuracy of the results and the running time of our method is comparable with other popular methods, such as NAOMI, fconv, and I-interpret. On the popular Labute's benchmark set consisting of 179 protein-ligand complexes, Knodle makes five to six perception errors, NAOMI makes seven errors, I-interpret makes nine errors, and fconv makes 13 errors. On a larger set of 3,000 protein-ligand structures collected from the PDBBindCN general data set (v2014), Knodle and NAOMI have a comparable accuracy of approximately 3.9% and 4.7% of errors, I-interpret made 6.0% of errors, while fconv produced approximately 12.8% of errors. On a more general set of 332,974 entries collected from the Ligand Expo database, Knodle made 4.5% of errors. Overall, our study demonstrates the efficiency and robustness of nonlinear SVM in structure perception tasks. Knodle is available at https://team.inria.fr/nano-d/software/Knodle .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Automation
  • Informatics / methods*
  • Ligands
  • Molecular Weight
  • Organic Chemicals / chemistry*
  • Software*
  • Support Vector Machine*

Substances

  • Ligands
  • Organic Chemicals