Benchmarking substrate-based kinase activity inference using phosphoproteomic data

Bioinformatics. 2017 Jun 15;33(12):1845-1851. doi: 10.1093/bioinformatics/btx082.

Abstract

Motivation: Phosphoproteomic experiments are increasingly used to study the changes in signaling occurring across different conditions. It has been proposed that changes in phosphorylation of kinase target sites can be used to infer when a kinase activity is under regulation. However, these approaches have not yet been benchmarked due to a lack of appropriate benchmarking strategies.

Results: We used curated phosphoproteomic experiments and a gold standard dataset containing a total of 184 kinase-condition pairs where regulation is expected to occur to benchmark and compare different kinase activity inference strategies: Z-test, Kolmogorov Smirnov test, Wilcoxon rank sum test, gene set enrichment analysis (GSEA), and a multiple linear regression model. We also tested weighted variants of the Z-test and GSEA that include information on kinase sequence specificity as proxy for affinity. Finally, we tested how the number of known substrates and the type of evidence ( in vivo , in vitro or in silico ) supporting these influence the predictions.

Conclusions: Most models performed well with the Z-test and the GSEA performing best as determined by the area under the ROC curve (Mean AUC = 0.722). Weighting kinase targets by the kinase target sequence preference improves the results marginally. However, the number of known substrates and the evidence supporting the interactions has a strong effect on the predictions.

Availability and implementation: The KSEA implementation is available in https://github.com/ evocellnet/ksea. Additional data is available in http://phosfate.com.

Contact: pbeltrao@ebi.ac.uk or ochoa@ebi.ac.uk.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Computer Simulation
  • Humans
  • Phosphoproteins / metabolism*
  • Phosphotransferases / metabolism*
  • Proteomics / methods*
  • Signal Transduction
  • Software*

Substances

  • Phosphoproteins
  • Phosphotransferases