Comprehensive Prediction of Molecular Recognition in a Combinatorial Chemical Space Using Machine Learning

ACS Comb Sci. 2020 Oct 12;22(10):500-508. doi: 10.1021/acscombsci.0c00003. Epub 2020 Aug 17.

Abstract

In combinatorial chemical approaches, optimizing the composition and arrangement of building blocks toward a particular function has been done using a number of methods, including high throughput molecular screening, molecular evolution, and computational prescreening. Here, a different approach is considered that uses sparse measurements of library molecules as the input to a machine learning algorithm which generates a comprehensive, quantitative relationship between covalent molecular structure and function that can then be used to predict the function of any molecule in the possible combinatorial space. To test the feasibility of the approach, a defined combinatorial chemical space consisting of ∼1012 possible linear combinations of 16 different amino acids was used. The binding of a very sparse, but nearly random, sampling of this amino acid sequence space to 9 different protein targets is measured and used to generate a general relationship between peptide sequence and binding for each target. Surprisingly, measuring as little as a few hundred to a few thousand of the ∼1012 possible molecules provides sufficient training to be highly predictive of the binding of the remaining molecules in the combinatorial space. Furthermore, measuring only amino acid sequences that bind weakly to a target allows the accurate prediction of which sequences will bind 10-100 times more strongly. Thus, the molecular recognition information contained in a tiny fraction of molecules in this combinatorial space is sufficient to characterize any set of molecules randomly selected from the entire space, a fact that potentially has significant implications for the design of new chemical function using combinatorial chemical libraries.

Keywords: affinity; ligand; machine learning; molecular recognition; neural network; peptide array; prediction; protein target.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Combinatorial Chemistry Techniques
  • High-Throughput Screening Assays
  • Ligands
  • Machine Learning*
  • Models, Molecular
  • Molecular Structure
  • Peptide Library
  • Peptides / chemistry*
  • Protein Binding
  • Structure-Activity Relationship

Substances

  • Ligands
  • Peptide Library
  • Peptides