DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes

Rui Deng; Ke Wu; Jiawei Lin; Dehang Wang; Yuanyuan Huang; Yang Li; Zhenkun Shi; Zihan Zhang; Zhiwen Wang; Zhitao Mao; Xiaoping Liao; Hongwu Ma

doi:10.3390/ijms25094803

DeepSub: Utilizing Deep Learning for Predicting the Number of Subunits in Homo-Oligomeric Protein Complexes

Int J Mol Sci. 2024 Apr 28;25(9):4803. doi: 10.3390/ijms25094803.

Authors

Rui Deng^{1

2

3}, Ke Wu⁴, Jiawei Lin^{1

3}, Dehang Wang^{1

3}, Yuanyuan Huang^{1

3}, Yang Li^{3

5}, Zhenkun Shi³, Zihan Zhang⁶, Zhiwen Wang⁷, Zhitao Mao³, Xiaoping Liao^{2

3}, Hongwu Ma³

Affiliations

¹ College of Biotechnology, Tianjin University of Science and Technology, Tianjin 300457, China.
² Haihe Laboratory of Synthetic Biology, Tianjin 300308, China.
³ Biodesign Center, Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China.
⁴ Institute of Biopharmaceutical and Health Engineering, Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China.
⁵ University of Chinese Academy of Sciences, Beijing 100049, China.
⁶ School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.
⁷ Key Laboratory of Systems Bioengineering (Ministry of Education), Frontier Science Center for Synthetic Biology (Ministry of Education), Department of Biochemical Engineering, School of Chemical Engineering and Technology, Tianjin University, Tianjin 300072, China.

Abstract

The molecular weight (MW) of an enzyme is a critical parameter in enzyme-constrained models (ecModels). It is determined by two factors: the presence of subunits and the abundance of each subunit. Although the number of subunits (NS) can potentially be obtained from UniProt, this information is not readily available for most proteins. In this study, we addressed this gap by extracting and curating subunit information from the UniProt database to establish a robust benchmark dataset. Subsequently, we propose a novel model named DeepSub, which leverages the protein language model and Bi-directional Gated Recurrent Unit (GRU), to predict NS in homo-oligomers solely based on protein sequences. DeepSub demonstrates remarkable accuracy, achieving an accuracy rate as high as 0.967, surpassing the performance of QUEEN. To validate the effectiveness of DeepSub, we performed predictions for protein homo-oligomers that have been reported in the literature but are not documented in the UniProt database. Examples include homoserine dehydrogenase from Corynebacterium glutamicum, Matrilin-4 from Mus musculus and Homo sapiens, and the Multimerins protein family from M. musculus and H. sapiens. The predicted results align closely with the reported findings in the literature, underscoring the reliability and utility of DeepSub.

Keywords: deep learning; homo-oligomers; protein language model; subunit.

MeSH terms

Animals
Computational Biology / methods
Databases, Protein*
Deep Learning*
Humans
Mice
Protein Multimerization
Protein Subunits* / chemistry
Protein Subunits* / metabolism

Substances

Protein Subunits

Abstract

MeSH terms

Substances

Grants and funding