Protein-Protein Interactions Efficiently Modeled by Residue Cluster Classes

Int J Mol Sci. 2020 Jul 6;21(13):4787. doi: 10.3390/ijms21134787.

Abstract

Predicting protein-protein interactions (PPI) represents an important challenge in structural bioinformatics. Current computational methods display different degrees of accuracy when predicting these interactions. Different factors were proposed to help improve these predictions, including choosing the proper descriptors of proteins to represent these interactions, among others. In the current work, we provide a representative protein structure that is amenable to PPI classification using machine learning approaches, referred to as residue cluster classes. Through sampling and optimization, we identified the best algorithm-parameter pair to classify PPI from more than 360 different training sets. We tested these classifiers against PPI datasets that were not included in the training set but shared sequence similarity with proteins in the training set to reproduce the situation of most proteins sharing sequence similarity with others. We identified a model with almost no PPI error (96-99% of correctly classified instances) and showed that residue cluster classes of protein pairs displayed a distinct pattern between positive and negative protein interactions. Our results indicated that residue cluster classes are structural features relevant to model PPI and provide a novel tool to mathematically model the protein structure/function relationship.

Keywords: machine learning; protein–protein interaction; residue cluster class.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases, Protein / statistics & numerical data*
  • Machine Learning*
  • Protein Interaction Mapping / methods*
  • Proteins / chemistry*
  • Sequence Analysis, Protein / methods

Substances

  • Proteins