Deep reinforcement learning enables better bias control in benchmark for virtual screening

Tao Shen; Shan Li; Xiang Simon Wang; Dongmei Wang; Song Wu; Jie Xia; Liangren Zhang

doi:10.1016/j.compbiomed.2024.108165

Deep reinforcement learning enables better bias control in benchmark for virtual screening

Comput Biol Med. 2024 Mar:171:108165. doi: 10.1016/j.compbiomed.2024.108165. Epub 2024 Feb 15.

Authors

Tao Shen¹, Shan Li², Xiang Simon Wang³, Dongmei Wang⁴, Song Wu⁵, Jie Xia⁶, Liangren Zhang⁷

Affiliations

¹ State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China.
² College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China.
³ Artificial Intelligence and Drug Discovery Core Laboratory for District of Columbia Center for AIDS Research (DC CFAR), Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, USA.
⁴ State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China. Electronic address: wangdmchina@imm.ac.cn.
⁵ State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China. Electronic address: ws@imm.ac.cn.
⁶ State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China. Electronic address: jie.william.xia@hotmail.com.
⁷ State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing, 100191, China.

PMID: 38402838
DOI: 10.1016/j.compbiomed.2024.108165

Abstract

Virtual screening (VS) has been incorporated into the paradigm of modern drug discovery. This field is now undergoing a new wave of revolution driven by artificial intelligence and more specifically, machine learning (ML). In terms of those out-of-the-box datasets for model training or benchmarking, their data volume and applicability domain are limited. They are suffering from the biases constantly reported in the ML application. To address these issues, we present a novel benchmark named MUBD^syn. The utilization of synthetic decoys (i.e., presumed inactives) is the main feature of MUBD^syn, where deep reinforcement learning was leveraged for bias control during decoy generation. Then, we carried out extensive validations on this new benchmark. First, we confirmed that MUBD^syn was superior to the classical benchmarks in control of domain bias, artificial enrichment bias and analogue bias. Moreover, we found that the assessment of ML models based on MUBD^syn was less biased as revealed by the analysis of asymmetric validation embedding bias. In addition, MUBD^syn showed better setting of benchmarking challenge for deep learning models compared with NRLiSt-BDB. Overall, we have proven that MUBD^syn is the close-to-ideal benchmark for VS. The computational tool is publicly available for the easy extension of MUBD^syn.

Keywords: Benchmarking datasets; Data augmentation; Decoy; Drug design; Generative model; Machine learning; Reinforcement learning; Virtual screening.

MeSH terms

Artificial Intelligence*
Benchmarking*
Bias
Drug Discovery
Machine Learning