iPhos-PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory

Mol Inform. 2017 May;36(5-6). doi: 10.1002/minf.201600010. Epub 2016 May 12.

Abstract

Protein phosphorylation plays a critical role in human body by altering the structural conformation of a protein, causing it to become activated/deactivated, or functional modification. Given an uncharacterized protein sequence, can we predict whether it may be phosphorylated or may not? This is no doubt a very meaningful problem for both basic research and drug development. Unfortunately, to our best knowledge, so far no high throughput bioinformatics tool whatsoever has been developed to address such a very basic but important problem due to its extremely complexity and lacking sufficient training data. Here we proposed a predictor called iPhos-PseEvo by (1) incorporating the protein sequence evolutionary information into the general pseudo amino acid composition (PseAAC) via the grey system theory, (2) balancing out the skewed training datasets by the asymmetric bootstrap approach, and (3) constructing an ensemble predictor by fusing an array of individual random forest classifiers thru a voting system. Rigorous jackknife tests have indicated that very promising success rates have been achieved by iPhos-PseEvo even for such a difficult problem. A user-friendly web-server for iPhos-PseEvo has been established at http://www.jci-bioinfo.cn/iPhos-PseEvo, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can be used to analyze many other problems in protein science as well.

Keywords: Chou's general PseAAC; Disease-related phosphorylation; Evolutionary information; Fusion ensemble classifier; Grey system model; Random forest classifiers.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Evolution, Molecular*
  • Humans
  • Phosphoproteins / chemistry*
  • Phosphoproteins / classification
  • Phosphoproteins / genetics
  • Software*
  • Systems Theory*

Substances

  • Phosphoproteins