De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping

J Chem Inf Model. 2019 Mar 25;59(3):1182-1196. doi: 10.1021/acs.jcim.8b00751. Epub 2019 Mar 5.

Abstract

Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Catalytic Domain
  • Deep Learning*
  • Drug Design*
  • Drug Evaluation, Preclinical
  • Ligands
  • Molecular Docking Simulation
  • Receptor, Adenosine A2A / chemistry
  • Receptor, Adenosine A2A / metabolism
  • Small Molecule Libraries / metabolism
  • Small Molecule Libraries / pharmacology

Substances

  • Ligands
  • Receptor, Adenosine A2A
  • Small Molecule Libraries