RediscMol: Benchmarking Molecular Generation Models in Biological Properties

Gaoqi Weng; Huifeng Zhao; Dou Nie; Haotian Zhang; Liwei Liu; Tingjun Hou; Yu Kang

doi:10.1021/acs.jmedchem.3c02051

RediscMol: Benchmarking Molecular Generation Models in Biological Properties

J Med Chem. 2024 Jan 25;67(2):1533-1543. doi: 10.1021/acs.jmedchem.3c02051. Epub 2024 Jan 5.

Authors

Gaoqi Weng¹, Huifeng Zhao¹, Dou Nie¹, Haotian Zhang¹, Liwei Liu², Tingjun Hou¹, Yu Kang¹

Affiliations

¹ Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang UniversityHangzhou 310058, Zhejiang, China.
² Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Shenzhen 518129, Guangdong, China.

PMID: 38181194
DOI: 10.1021/acs.jmedchem.3c02051

Abstract

Deep learning-based molecular generative models have garnered emerging attention for their capability to generate molecules with novel structures and desired physicochemical properties. However, the evaluation of these models, particularly in a biological context, remains insufficient. To address the limitations of existing metrics and emulate practical application scenarios, we construct the RediscMol benchmark that comprises active molecules extracted from 5 kinase and 3 GPCR data sets. A set of rediscovery- and similarity-related metrics are introduced to assess the performance of 8 representative generative models (CharRNN, VAE, Reinvent, AAE, ORGAN, RNNAttn, TransVAE, and GraphAF). Our findings based on the RediscMol benchmark differ from those of previous evaluations. CharRNN, VAE, and Reinvent exhibit a greater ability to reproduce known active molecules, while RNNAttn, TransVAE, and GraphAF struggle in this aspect despite their notable performance on commonly used distribution-learning metrics. Our evaluation framework may provide valuable guidance for advancing generative models in real-world drug design scenarios.

MeSH terms

Benchmarking*
Drug Design*
Models, Molecular