Deep Learning-Based Ligand Design Using Shared Latent Implicit Fingerprints from Collaborative Filtering

Raghuram Srinivas; Niraj Verma; Elfi Kraka; Eric C Larson

doi:10.1021/acs.jcim.0c01355

Deep Learning-Based Ligand Design Using Shared Latent Implicit Fingerprints from Collaborative Filtering

J Chem Inf Model. 2021 May 24;61(5):2159-2174. doi: 10.1021/acs.jcim.0c01355. Epub 2021 Apr 26.

Authors

Raghuram Srinivas¹, Niraj Verma², Elfi Kraka², Eric C Larson¹

Affiliations

¹ Department of Computer Science, Southern Methodist University, Dallas, Texas 75205, United States.
² Department of Chemistry, Southern Methodist University, Dallas, Texas 75205, United States.

PMID: 33899481
DOI: 10.1021/acs.jcim.0c01355

Abstract

In their previous work, Srinivas et al. [ J. Cheminf. 2018, 10, 56] have shown that implicit fingerprints capture ligands and proteins in a shared latent space, typically for the purposes of virtual screening with collaborative filtering models applied on known bioactivity data. In this work, we extend these implicit fingerprints/descriptors using deep learning techniques to translate latent descriptors into discrete representations of molecules (SMILES), without explicitly optimizing for chemical properties. This allows the design of new compounds based upon the latent representation of nearby proteins, thereby encoding druglike properties including binding affinities to known proteins. The implicit descriptor method does not require any fingerprint similarity search, which makes the method free of any bias arising from the empirical nature of the fingerprint models [Srinivas, R.; J. Cheminf. 2018, 10, 56]. We evaluate the properties of the potentially novel drugs generated by our approach using physical properties of druglike molecules and chemical complexity. Additionally, we analyze the reliability of the biological activity of the new compounds generated using this method by employing models of protein-ligand interaction, which assists in assessing the potential binding affinity of the designed compounds. We find that the generated compounds exhibit properties of chemically feasible compounds and are predicted to be excellent binders to known proteins. Furthermore, we also analyze the diversity of compounds created using the Tanimoto distance and conclude that there is a wide diversity in the generated compounds.

MeSH terms

Deep Learning*
Ligands
Proteins
Reproducibility of Results

Substances

Ligands
Proteins