DenovoProfiling: A webserver for de novo generated molecule library profiling

Comput Struct Biotechnol J. 2022 Aug 2:20:4082-4097. doi: 10.1016/j.csbj.2022.07.045. eCollection 2022.

Abstract

Various deep learning-based architectures for molecular generation have been proposed for de novo drug design. The flourish of the de novo molecular generation methods and applications has created a great demand for the visualization and functional profiling for the de novo generated molecules. An increasing number of publicly available chemogenomic databases sets good foundations and creates good opportunities for comprehensive profiling of the de novo library. In this paper, we present DenovoProfiling, a webserver dedicated to de novo library visualization and functional profiling. Currently, DenovoProfiling contains six modules: (1) identification & visualization module for chemical structure visualization and identify the reported structures, (2) chemical space module for chemical space exploration using similarity maps, principal components analysis (PCA), drug-like properties distribution, and scaffold-based clustering, (3) ADMET prediction module for predicting the ADMET properties of the de novo molecules, (4) molecular alignment module for three dimensional molecular shape analysis, (5) drugs mapping module for identifying structural similar drugs, and (6) target & pathway module for identifying the reported targets and corresponding functional pathways. DenovoProfiling could provide structural identification, chemical space exploration, drug mapping, and target & pathway information. The comprehensive annotated information could give users a clear picture of their de novo library and could guide the further selection of candidates for chemical synthesis and biological confirmation. DenovoProfiling is freely available at http://denovoprofiling.xielab.net.

Keywords: DDR1, Discovered potent discoidin domain receptor 1; De novo drug design; De novo molecule library; Deep learning; FBDD, Fragment-based drug design; FDR, False discovery rate; GAN, Generative adversarial networks; HTS, High throughput screening; LSTM, Long short-term memory; Library profiling; PCA, Principal components analysis; RNN, Recurrent neural networks; SCA, Scaffold-based classification approach; VAE, Variational autoencoders.