EMBER: multi-label prediction of kinase-substrate phosphorylation events through deep learning

Bioinformatics. 2022 Apr 12;38(8):2119-2126. doi: 10.1093/bioinformatics/btac083.

Abstract

Motivation: Kinase-catalyzed phosphorylation of proteins forms the backbone of signal transduction within the cell, enabling the coordination of numerous processes such as the cell cycle, apoptosis, and differentiation. Although on the order of 105 phosphorylation events have been described, we know the specific kinase performing these functions for <5% of cases. The ability to predict which kinases initiate specific individual phosphorylation events has the potential to greatly enhance the design of downstream experimental studies, while simultaneously creating a preliminary map of the broader phosphorylation network that controls cellular signaling.

Results: We describe Embedding-based multi-label prediction of phosphorylation events (EMBER), a deep learning method that integrates kinase phylogenetic information and motif-dissimilarity information into a multi-label classification model for the prediction of kinase-motif phosphorylation events. Unlike previous deep learning methods that perform single-label classification, we restate the task of kinase-motif phosphorylation prediction as a multi-label problem, allowing us to train a single unified model rather than a separate model for each of the 134 kinase families. We utilize a Siamese neural network to generate novel vector representations, or an embedding, of peptide motif sequences, and we compare our novel embedding to a previously proposed peptide embedding. Our motif vector representations are used, along with one-hot encoded motif sequences, as input to a classification neural network while also leveraging kinase phylogenetic relationships into our model via a kinase phylogeny-weighted loss function. Results suggest that this approach holds significant promise for improving the known map of phosphorylation relationships that underlie kinome signaling.

Availability and implementation: The data and code underlying this article are available in a GitHub repository at https://github.com/gomezlab/EMBER.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Deep Learning*
  • Phosphorylation
  • Phosphotransferases
  • Phylogeny
  • Proteins

Substances

  • Phosphotransferases
  • Proteins