RAM-PGK: Prediction of Lysine Phosphoglycerylation Based on Residue Adjacency Matrix

Genes (Basel). 2020 Dec 20;11(12):1524. doi: 10.3390/genes11121524.

Abstract

Background: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues.

Results: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor.

Conclusion: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.

Keywords: amino acids; lysine; non-phosphoglycerylation; phosphoglycerylation; post-translational modification; predictor; protein lysine modification database; protein sequence; residue adjacency matrix; support vector machine.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Datasets as Topic*
  • Glyceric Acids / metabolism*
  • Lysine / chemistry
  • Lysine / metabolism*
  • Protein Processing, Post-Translational*
  • ROC Curve
  • Software
  • Support Vector Machine*

Substances

  • Glyceric Acids
  • 3-phosphoglycerate
  • Lysine