Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning

Xiaopeng Xu; Juexiao Zhou; Chen Zhu; Qing Zhan; Zhongxiao Li; Ruochi Zhang; Yu Wang; Xingyu Liao; Xin Gao

doi:10.12688/f1000research.130936.2

Optimization of binding affinities in chemical space with generative pre-trained transformer and deep reinforcement learning

F1000Res. 2024 Feb 20:12:757. doi: 10.12688/f1000research.130936.2. eCollection 2023.

Authors

Xiaopeng Xu^{1

2}, Juexiao Zhou^{1

2}, Chen Zhu³, Qing Zhan^{1

2}, Zhongxiao Li^{1

2}, Ruochi Zhang⁴, Yu Wang⁴, Xingyu Liao^{1

2}, Xin Gao^{1

2}

Affiliations

¹ Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
² Computer, Electrical and Mathematical Sciences and Engineering (CEMSE), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
³ KAUST Catalysis Center (KCC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Saudi Arabia.
⁴ Syneron Technology, Guangzhou, China.

Abstract

Background: The key challenge in drug discovery is to discover novel compounds with desirable properties. Among the properties, binding affinity to a target is one of the prerequisites and usually evaluated by molecular docking or quantitative structure activity relationship (QSAR) models.

Methods: In this study, we developed SGPT-RL, which uses a generative pre-trained transformer (GPT) as the policy network of the reinforcement learning (RL) agent to optimize the binding affinity to a target. SGPT-RL was evaluated on the Moses distribution learning benchmark and two goal-directed generation tasks, with Dopamine Receptor D2 (DRD2) and Angiotensin-Converting Enzyme 2 (ACE2) as the targets. Both QSAR model and molecular docking were implemented as the optimization goals in the tasks. The popular Reinvent method was used as the baseline for comparison.

Results: The results on the Moses benchmark showed that SGPT-RL learned good property distributions and generated molecules with high validity and novelty. On the two goal-directed generation tasks, both SGPT-RL and Reinvent were able to generate valid molecules with improved target scores. The SGPT-RL method achieved better results than Reinvent on the ACE2 task, where molecular docking was used as the optimization goal. Further analysis shows that SGPT-RL learned conserved scaffold patterns during exploration.

Conclusions: The superior performance of SGPT-RL in the ACE2 task indicates that it can be applied to the virtual screening process where molecular docking is widely used as the criteria. Besides, the scaffold patterns learned by SGPT-RL during the exploration process can assist chemists to better design and discover novel lead candidates.

Keywords: Drug design; hit discovery; molecular docking; reinforcement learning; transformers.

MeSH terms

Alanine Transaminase
Angiotensin-Converting Enzyme 2*
Benchmarking
Learning*
Molecular Docking Simulation

Substances

Angiotensin-Converting Enzyme 2
Alanine Transaminase

Grants and funding

This work was supported by the grants assigned to Prof. Xin Gao from the King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA) under Award No FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4663-01-01, REI/1/5202-01-01, REI/1/4940-01-01, and RGC/3/4816-01-01.