Sequence-based prediction of protein-peptide binding sites using support vector machine

J Comput Chem. 2016 May 15;37(13):1223-9. doi: 10.1002/jcc.24314. Epub 2016 Feb 2.

Abstract

Protein-peptide interactions are essential for all cellular processes including DNA repair, replication, gene-expression, and metabolism. As most protein-peptide interactions are uncharacterized, it is cost effective to investigate them computationally as the first step. All existing approaches for predicting protein-peptide binding sites, however, are based on protein structures despite the fact that the structures for most proteins are not yet solved. This article proposes the first machine-learning method called SPRINT to make Sequence-based prediction of Protein-peptide Residue-level Interactions. SPRINT yields a robust and consistent performance for 10-fold cross validations and independent test. The most important feature is evolution-generated sequence profiles. For the test set (1056 binding and non-binding residues), it yields a Matthews' Correlation Coefficient of 0.326 with a sensitivity of 64% and a specificity of 68%. This sequence-based technique shows comparable or more accurate than structure-based methods for peptide-binding site prediction. SPRINT is available as an online server at: http://sparks-lab.org/. © 2016 Wiley Periodicals, Inc.

Keywords: binding site; features; machine learning; prediction; protein-peptide; sequence-based; support vector machine.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Binding Sites
  • Peptides / chemistry*
  • Peptides / metabolism*
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Support Vector Machine*

Substances

  • Peptides
  • Proteins