Designing Multi-target Compound Libraries with Gaussian Process Models

Mol Inform. 2016 May;35(5):192-8. doi: 10.1002/minf.201501012. Epub 2016 Mar 2.

Abstract

We present the application of machine learning models to selecting G protein-coupled receptor (GPCR)-focused compound libraries. The library design process was realized by ant colony optimization. A proprietary Boehringer-Ingelheim reference set consisting of 3519 compounds tested in dose-response assays at 11 GPCR targets served as training data for machine learning and activity prediction. We compared the usability of the proprietary data with a public data set from ChEMBL. Gaussian process models were trained to prioritize compounds from a virtual combinatorial library. We obtained meaningful models for three of the targets (5-HT2c , MCH, A1), which were experimentally confirmed for 12 of 15 selected and synthesized or purchased compounds. Overall, the models trained on the public data predicted the observed assay results more accurately. The results of this study motivate the use of Gaussian process regression on public data for virtual screening and target-focused compound library design.

Keywords: ant colony optimization; combinatorial chemistry; drug design; machine learning; polypharmacology.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Combinatorial Chemistry Techniques
  • Databases, Pharmaceutical*
  • Drug Design
  • Machine Learning
  • Models, Molecular
  • Normal Distribution
  • Quantitative Structure-Activity Relationship
  • Receptors, G-Protein-Coupled / antagonists & inhibitors
  • Small Molecule Libraries

Substances

  • Receptors, G-Protein-Coupled
  • Small Molecule Libraries