PEPPI: Whole-proteome Protein-protein Interaction Prediction through Structure and Sequence Similarity, Functional Association, and Machine Learning

J Mol Biol. 2022 Jun 15;434(11):167530. doi: 10.1016/j.jmb.2022.167530. Epub 2022 Mar 5.

Abstract

Proteome-wide identification of protein-protein interactions is a formidable task which has yet to be sufficiently addressed by experimental methodologies. Many computational methods have been developed to predict proteome-wide interaction networks, but few leverage both the sensitivity of structural information and the wide availability of sequence data. We present PEPPI, a pipeline which integrates structural similarity, sequence similarity, functional association data, and machine learning-based classification through a naïve Bayesian classifier model to accurately predict protein-protein interactions at a proteomic scale. Through benchmarking against a set of 798 ground truth interactions and an equal number of non-interactions, we have found that PEPPI attains 4.5% higher AUROC than the best of other state-of-the-art methods. As a proteomic-scale application, PEPPI was applied to model the interactions which occur between SARS-CoV-2 and human host cells during coronavirus infection, where 403 high-confidence interactions were identified with predictions covering 73% of a gold standard dataset from PSICQUIC and demonstrating significant complementarity with the most recent high-throughput experiments. PEPPI is available both as a webserver and in a standalone version and should be a powerful and generally applicable tool for computational screening of protein-protein interactions.

Keywords: SARS-CoV-2; dimer threading; direct interaction prediction; interactome; interology.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • COVID-19
  • Humans
  • Machine Learning*
  • Protein Interaction Mapping*
  • Proteome* / chemistry
  • Proteomics
  • SARS-CoV-2
  • Software*

Substances

  • Proteome