Structure-based prediction of protein- peptide binding regions using Random Forest

Bioinformatics. 2018 Feb 1;34(3):477-484. doi: 10.1093/bioinformatics/btx614.

Abstract

Motivation: Protein-peptide interactions are one of the most important biological interactions and play crucial role in many diseases including cancer. Therefore, knowledge of these interactions provides invaluable insights into all cellular processes, functional mechanisms, and drug discovery. Protein-peptide interactions can be analyzed by studying the structures of protein-peptide complexes. However, only a small portion has known complex structures and experimental determination of protein-peptide interaction is costly and inefficient. Thus, predicting peptide-binding sites computationally will be useful to improve efficiency and cost effectiveness of experimental studies. Here, we established a machine learning method called SPRINT-Str (Structure-based prediction of protein-Peptide Residue-level Interaction) to use structural information for predicting protein-peptide binding residues. These predicted binding residues are then employed to infer the peptide-binding site by a clustering algorithm.

Results: SPRINT-Str achieves robust and consistent results for prediction of protein-peptide binding regions in terms of residues and sites. Matthews' Correlation Coefficient (MCC) for 10-fold cross validation and independent test set are 0.27 and 0.293, respectively, as well as 0.775 and 0.782, respectively for area under the curve. The prediction outperforms other state-of-the-art methods, including our previously developed sequence-based method. A further spatial neighbor clustering of predicted binding residues leads to prediction of binding sites at 20-116% higher coverage than the next best method at all precision levels in the test set. The application of SPRINT-Str to protein binding with DNA, RNA and carbohydrate confirms the method's capability of separating peptide-binding sites from other functional sites. More importantly, similar performance in prediction of binding residues and sites is obtained when experimentally determined structures are replaced by unbound structures or quality model structures built from homologs, indicating its wide applicability.

Availability and implementation: http://sparks-lab.org/server/SPRINT-Str.

Contact: yangyd25@mail.sysu.edu.cn.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology / methods
  • Humans
  • Machine Learning*
  • Peptides / chemistry
  • Peptides / metabolism*
  • Protein Binding
  • Protein Domains
  • Protein Tyrosine Phosphatase, Non-Receptor Type 4 / metabolism
  • Proteins / chemistry
  • Proteins / metabolism*
  • Sequence Analysis, Protein / methods*

Substances

  • Peptides
  • Proteins
  • PTPN4 protein, human
  • Protein Tyrosine Phosphatase, Non-Receptor Type 4