Optimization of serine phosphorylation prediction in proteins by comparing human engineered features and deep representations

Anal Biochem. 2021 Feb 15:615:114069. doi: 10.1016/j.ab.2020.114069. Epub 2020 Dec 16.

Abstract

Deep representations can be used to replace human-engineered representations, as such features are constrained by certain limitations. For the prediction of protein post-translation modifications (PTMs) sites, research community uses different feature extraction techniques applied on Pseudo amino acid compositions (PseAAC). Serine phosphorylation is one of the most important PTM as it is the most occurring, and is important for various biological functions. Creating efficient representations from large protein sequences, to predict PTM sites, is a time and resource intensive task. In this study we propose, implement and evaluate use of Deep learning to learn effective protein data representations from PseAAC to develop data driven PTM detection systems and compare the same with two human representations.. The comparisons are performed by training an xgboost based classifier using each representation. The best scores were achieved by RNN-LSTM based deep representation and CNN based representation with an accuracy score of 81.1% and 78.3% respectively. Human engineered representations scored 77.3% and 74.9% respectively. Based on these results, it is concluded that the deep features are promising feature engineering replacement to identify PhosS sites in a very efficient and accurate manner which can help scientists understand the mechanism of this modification in proteins.

Keywords: Deep features; Phosphorylation; Phosphoserine; Position relevancy; Statistical moments.

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / chemistry
  • Computational Biology / methods*
  • Deep Learning
  • Humans
  • Models, Biological
  • Phosphorylation
  • Protein Processing, Post-Translational*
  • Proteins / chemistry*
  • Proteins / metabolism
  • Serine / metabolism*

Substances

  • Amino Acids
  • Proteins
  • Serine