Structure-based prediction of protein- peptide binding regions using Random Forest

Ghazaleh Taherzadeh; Yaoqi Zhou; Alan Wee-Chung Liew; Yuedong Yang

doi:10.1093/bioinformatics/btx614

Structure-based prediction of protein- peptide binding regions using Random Forest

Bioinformatics. 2018 Feb 1;34(3):477-484. doi: 10.1093/bioinformatics/btx614.

Authors

Ghazaleh Taherzadeh¹, Yaoqi Zhou^{1

2}, Alan Wee-Chung Liew¹, Yuedong Yang^{1

2

3}

Affiliations

¹ School of Information and Communication Technology.
² Institute for Glycomics, Griffith University, Parklands Drive, Southport, QLD 4215, Australia.
³ School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510275, China.

PMID: 29028926
DOI: 10.1093/bioinformatics/btx614

Abstract

Motivation: Protein-peptide interactions are one of the most important biological interactions and play crucial role in many diseases including cancer. Therefore, knowledge of these interactions provides invaluable insights into all cellular processes, functional mechanisms, and drug discovery. Protein-peptide interactions can be analyzed by studying the structures of protein-peptide complexes. However, only a small portion has known complex structures and experimental determination of protein-peptide interaction is costly and inefficient. Thus, predicting peptide-binding sites computationally will be useful to improve efficiency and cost effectiveness of experimental studies. Here, we established a machine learning method called SPRINT-Str (Structure-based prediction of protein-Peptide Residue-level Interaction) to use structural information for predicting protein-peptide binding residues. These predicted binding residues are then employed to infer the peptide-binding site by a clustering algorithm.

Results: SPRINT-Str achieves robust and consistent results for prediction of protein-peptide binding regions in terms of residues and sites. Matthews' Correlation Coefficient (MCC) for 10-fold cross validation and independent test set are 0.27 and 0.293, respectively, as well as 0.775 and 0.782, respectively for area under the curve. The prediction outperforms other state-of-the-art methods, including our previously developed sequence-based method. A further spatial neighbor clustering of predicted binding residues leads to prediction of binding sites at 20-116% higher coverage than the next best method at all precision levels in the test set. The application of SPRINT-Str to protein binding with DNA, RNA and carbohydrate confirms the method's capability of separating peptide-binding sites from other functional sites. More importantly, similar performance in prediction of binding residues and sites is obtained when experimentally determined structures are replaced by unbound structures or quality model structures built from homologs, indicating its wide applicability.

Availability and implementation: http://sparks-lab.org/server/SPRINT-Str.

Contact: yangyd25@mail.sysu.edu.cn.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Computational Biology / methods
Humans
Machine Learning*
Peptides / chemistry
Peptides / metabolism*
Protein Binding
Protein Domains
Protein Tyrosine Phosphatase, Non-Receptor Type 4 / metabolism
Proteins / chemistry
Proteins / metabolism*
Sequence Analysis, Protein / methods*

Substances

Peptides
Proteins
PTPN4 protein, human
Protein Tyrosine Phosphatase, Non-Receptor Type 4