Coupling Matched Molecular Pairs with Machine Learning for Virtual Compound Optimization

J Chem Inf Model. 2017 Dec 26;57(12):3079-3085. doi: 10.1021/acs.jcim.7b00298. Epub 2017 Nov 27.

Abstract

Matched molecular pair (MMP) analyses are widely used in compound optimization projects to gain insights into structure-activity relationships (SAR). The analysis is traditionally done via statistical methods but can also be employed together with machine learning (ML) approaches to extrapolate to novel compounds. The here introduced MMP/ML method combines a fragment-based MMP implementation with different machine learning methods to obtain automated SAR decomposition and prediction. To test the prediction capabilities and model transferability, two different compound optimization scenarios were designed: (1) "new fragments" which occurs when exploring new fragments for a defined compound series and (2) "new static core and transformations" which resembles for instance the identification of a new compound series. Very good results were achieved by all employed machine learning methods especially for the new fragments case, but overall deep neural network models performed best, allowing reliable predictions also for the new static core and transformations scenario, where comprehensive SAR knowledge of the compound series is missing. Furthermore, we show that models trained on all available data have a higher generalizability compared to models trained on focused series and can extend beyond chemical space covered in the training data. Thus, coupling MMP with deep neural networks provides a promising approach to make high quality predictions on various data sets and in different compound optimization scenarios.

MeSH terms

  • Computer Simulation
  • Drug Discovery / methods*
  • Humans
  • Ligands
  • Machine Learning*
  • Models, Biological
  • Structure-Activity Relationship*

Substances

  • Ligands