Creating the New from the Old: Combinatorial Libraries Generation with Machine-Learning-Based Compound Structure Optimization

J Chem Inf Model. 2017 Feb 27;57(2):133-147. doi: 10.1021/acs.jcim.6b00426. Epub 2017 Feb 15.

Abstract

The growing computational abilities of various tools that are applied in the broadly understood field of computer-aided drug design have led to the extreme popularity of virtual screening in the search for new biologically active compounds. Most often, the source of such molecules consists of commercially available compound databases, but they can also be searched for within the libraries of structures generated in silico from existing ligands. Various computational combinatorial approaches are based solely on the chemical structure of compounds, using different types of substitutions for new molecules formation. In this study, the starting point for combinatorial library generation was the fingerprint referring to the optimal substructural composition in terms of the activity toward a considered target, which was obtained using a machine learning-based optimization procedure. The systematic enumeration of all possible connections between preferred substructures resulted in the formation of target-focused libraries of new potential ligands. The compounds were initially assessed by machine learning methods using a hashed fingerprint to represent molecules; the distribution of their physicochemical properties was also investigated, as well as their synthetic accessibility. The examination of various fingerprints and machine learning algorithms indicated that the Klekota-Roth fingerprint and support vector machine were an optimal combination for such experiments. This study was performed for 8 protein targets, and the obtained compound sets and their characterization are publically available at http://skandal.if-pan.krakow.pl/comb_lib/ .

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Combinatorial Chemistry Techniques / methods*
  • Computer-Aided Design
  • Databases, Pharmaceutical
  • Drug Design*
  • Machine Learning*
  • Small Molecule Libraries / chemistry*

Substances

  • Small Molecule Libraries