Prediction of phosphorylation sites based on Krawtchouk image moments

Proteins. 2017 Dec;85(12):2231-2238. doi: 10.1002/prot.25388. Epub 2017 Sep 29.

Abstract

Protein phosphorylation is one of the most pervasive post-translational modifications and regulates diverse cellular processes in organisms. Under the catalysis of protein kinases, protein phosphorylation usually occurred in the residues serine (S), threonine (T), or tyrosine (Y). In this contribution, we proposed a novel scheme (named KMPhos) for the theoretical prediction of protein phosphorylation sites. First, the numerical matrix was obtained from a protein sequence fragment by replacing the characters of the residues with the chemical descriptors of amino acid molecules to approximately describe the chemical environment of the protein fragment, which was turned to the grayscale image. Then the Krawtchouk image moments were calculated and used to establish the support vector machine models. The accuracies of 10-fold cross validation for the obtained models on the training set are up to 89.7%, 88.6%, and 90.1% for the residues S, Y, and T, respectively. For the independent test set, the prediction accuracies are up to 90.7% (S), 87.8% (T), and 89.3% (Y). The results of ROC and other evaluations are also satisfactory. Compared with several specialized prediction tools, KMPhos provided the higher accuracy and reliability. An available KMPhos package is provided and can be used directly for phosphorylation sites prediction.

Keywords: Arabidopsis thaliana; Krawtchouk moments; digital image; phosphorylation site; support vector machine.

MeSH terms

  • Amino Acid Sequence
  • Arabidopsis / genetics
  • Arabidopsis / metabolism*
  • Arabidopsis Proteins / genetics
  • Arabidopsis Proteins / metabolism*
  • Area Under Curve
  • Computational Biology / methods
  • Phosphorylation
  • Protein Kinases / genetics
  • Protein Kinases / metabolism
  • Protein Processing, Post-Translational*
  • ROC Curve
  • Serine / metabolism*
  • Support Vector Machine
  • Threonine / metabolism*
  • Tyrosine / metabolism*

Substances

  • Arabidopsis Proteins
  • Threonine
  • Tyrosine
  • Serine
  • Protein Kinases