Machine Learning Estimation of Atom Condensed Fukui Functions

Mol Inform. 2016 Feb;35(2):62-9. doi: 10.1002/minf.201500113. Epub 2015 Nov 24.

Abstract

To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %.

Keywords: Bradley-Terry Models; Chemoinformatics; QSPR; Quantum Chemistry; Random Forest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Machine Learning*
  • Models, Chemical
  • Quantitative Structure-Activity Relationship