DeepSA: a deep-learning driven predictor of compound synthesis accessibility

Shihang Wang; Lin Wang; Fenglei Li; Fang Bai

doi:10.1186/s13321-023-00771-3

DeepSA: a deep-learning driven predictor of compound synthesis accessibility

J Cheminform. 2023 Nov 2;15(1):103. doi: 10.1186/s13321-023-00771-3.

Authors

Shihang Wang^#¹, Lin Wang^#¹, Fenglei Li², Fang Bai^{3

4

5}

Affiliations

¹ Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China.
² School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China.
³ Shanghai Institute for Advanced Immunochemical Studies and School of Life Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China. baifang@shanghaitech.edu.cn.
⁴ School of Information Science and Technology, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, China. baifang@shanghaitech.edu.cn.
⁵ Shanghai Clinical Research and Trial Center, Shanghai, 201210, China. baifang@shanghaitech.edu.cn.

^# Contributed equally.

Abstract

With the continuous development of artificial intelligence technology, more and more computational models for generating new molecules are being developed. However, we are often confronted with the question of whether these compounds are easy or difficult to synthesize, which refers to synthetic accessibility of compounds. In this study, a deep learning based computational model called DeepSA, was proposed to predict the synthesis accessibility of compounds, which provides a useful tool to choose molecules. DeepSA is a chemical language model that was developed by training on a dataset of 3,593,053 molecules using various natural language processing (NLP) algorithms, offering advantages over state-of-the-art methods and having a much higher area under the receiver operating characteristic curve (AUROC), i.e., 89.6%, in discriminating those molecules that are difficult to synthesize. This helps users select less expensive molecules for synthesis, reducing the time and cost required for drug discovery and development. Interestingly, a comparison of DeepSA with a Graph Attention-based method shows that using SMILES alone can also efficiently visualize and extract compound's informative features. DeepSA is available online on the below web server ( https://bailab.siais.shanghaitech.edu.cn/services/deepsa/ ) of our group, and the code is available at https://github.com/Shihang-Wang-58/DeepSA .

Keywords: Chemical language model; Deep learning; Drug design; Synthetic accessibility.

Abstract

Grants and funding