PhosIDN: an integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein-protein interaction information

Bioinformatics. 2021 Dec 11;37(24):4668-4676. doi: 10.1093/bioinformatics/btab551.

Abstract

Motivation: Phosphorylation is one of the most studied post-translational modifications, which plays a pivotal role in various cellular processes. Recently, deep learning methods have achieved great success in prediction of phosphorylation sites, but most of them are based on convolutional neural network that may not capture enough information about long-range dependencies between residues in a protein sequence. In addition, existing deep learning methods only make use of sequence information for predicting phosphorylation sites, and it is highly desirable to develop a deep learning architecture that can combine heterogeneous sequence and protein-protein interaction (PPI) information for more accurate phosphorylation site prediction.

Results: We present a novel integrated deep neural network named PhosIDN, for phosphorylation site prediction by extracting and combining sequence and PPI information. In PhosIDN, a sequence feature encoding sub-network is proposed to capture not only local patterns but also long-range dependencies from protein sequences. Meanwhile, useful PPI features are also extracted in PhosIDN by a PPI feature encoding sub-network adopting a multi-layer deep neural network. Moreover, to effectively combine sequence and PPI information, a heterogeneous feature combination sub-network is introduced to fully exploit the complex associations between sequence and PPI features, and their combined features are used for final prediction. Comprehensive experiment results demonstrate that the proposed PhosIDN significantly improves the prediction performance of phosphorylation sites and compares favorably with existing general and kinase-specific phosphorylation site prediction methods.

Availability and implementation: PhosIDN is freely available at https://github.com/ustchangyuanyang/PhosIDN.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Neural Networks, Computer*
  • Phosphorylation
  • Protein Processing, Post-Translational
  • Proteins* / metabolism

Substances

  • Proteins