Machine learning of chemical reactivity from databases of organic reactions

J Comput Aided Mol Des. 2009 Jul;23(7):419-29. doi: 10.1007/s10822-009-9275-2. Epub 2009 May 26.

Abstract

Databases of chemical reactions contain knowledge about the reactivity of specific reagents. Although information is in general only explicitly available for compounds reported to react, it is possible to derive information about substructures that do not react in the reported reactions. Both types of information (positive and negative) can be used to train machine learning techniques to predict if a compound reacts or not with a specific reagent. The whole process was implemented with two databases of reactions, one involving BuNH2 as the reagent, and the other NaCNBH3. Negative information was derived using MOLMAP molecular descriptors, and classification models were developed with Random Forests also based on MOLMAP descriptors. MOLMAP descriptors were based exclusively on calculated physicochemical features of molecules. Correct predictions were achieved for approximately 90% of independent test sets. While NaCNBH3 is a selective reducing reagent widely used in organic synthesis, BuNH2 is a nucleophile that mimics the reactivity of the lysine side chain (involved in an initiating step of the mechanism leading to skin sensitization).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Borohydrides / chemistry*
  • Butylamines / chemistry*
  • Computer Simulation
  • Databases, Factual
  • Models, Chemical
  • Molecular Structure
  • Quantitative Structure-Activity Relationship*

Substances

  • Borohydrides
  • Butylamines
  • sodium cyanoborohydride