Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map

Jianwen Chen; Shuangjia Zheng; Huiying Zhao; Yuedong Yang

doi:10.1186/s13321-021-00488-1

Structure-aware protein solubility prediction from sequence through graph convolutional network and predicted contact map

J Cheminform. 2021 Feb 8;13(1):7. doi: 10.1186/s13321-021-00488-1.

Authors

Jianwen Chen^#¹, Shuangjia Zheng^#¹, Huiying Zhao², Yuedong Yang^{3

4}

Affiliations

¹ School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China.
² Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China.
³ School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China. yangyd25@mail.sysu.edu.cn.
⁴ Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-Sen University), Guangzhou, 510000, China. yangyd25@mail.sysu.edu.cn.

^# Contributed equally.

Abstract

Protein solubility is significant in producing new soluble proteins that can reduce the cost of biocatalysts or therapeutic agents. Therefore, a computational model is highly desired to accurately predict protein solubility from the amino acid sequence. Many methods have been developed, but they are mostly based on the one-dimensional embedding of amino acids that is limited to catch spatially structural information. In this study, we have developed a new structure-aware method GraphSol to predict protein solubility by attentive graph convolutional network (GCN), where the protein topology attribute graph was constructed through predicted contact maps only from the sequence. GraphSol was shown to substantially outperform other sequence-based methods. The model was proven to be stable by consistent [Formula: see text] of 0.48 in both the cross-validation and independent test of the eSOL dataset. To our best knowledge, this is the first study to utilize the GCN for sequence-based protein solubility predictions. More importantly, this architecture could be easily extended to other protein prediction tasks requiring a raw protein sequence.

Keywords: Deep learning; Graph neural network; Predicted contact map; Protein solubility prediction.

Abstract

Grants and funding