Multi-iPPseEvo: A Multi-label Classifier for Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into Chou's General PseAAC via Grey System Theory

Mol Inform. 2017 Mar;36(3). doi: 10.1002/minf.201600085. Epub 2016 Sep 29.

Abstract

Predicting phosphorylation protein is a challenging problem, particularly when query proteins have multi-label features meaning that they may be phosphorylated at two or more different type amino acids. In fact, human protein usually be phosphorylated at serine, threonine and tyrosine. By introducing the "multi-label learning" approach, a novel predictor has been developed that can be used to deal with the systems containing both single- and multi-label phosphorylation protein. Here we proposed a predictor called Multi-iPPseEvo by (1) incorporating the protein sequence evolutionary information into the general pseudo amino acid composition (PseAAC) via the grey system theory, (2) balancing out the skewed training datasets by the asymmetric bootstrap approach, and (3) constructing an ensemble predictor by fusing an array of individual random forest classifiers thru a voting system. Rigorous cross-validations via a set of multi-label metrics indicate that the multi-label phosphorylation predictor is very promising and encouraging. The current approach represents a new strategy to deal with the multi-label biological problems, and the software is freely available for academic use at http://www.jci-bioinfo.cn/Multi-iPPseEvo.

Keywords: Ensemble classifier; Multi-label learning; Protein phosphorylation; Random Forests.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • Humans
  • Phosphorylation
  • Proteins / chemistry*
  • Software
  • Systems Theory

Substances

  • Proteins