HPiP: an R/Bioconductor package for predicting host-pathogen protein-protein interactions from protein sequences using ensemble machine learning approach

Bioinform Adv. 2022 May 23;2(1):vbac038. doi: 10.1093/bioadv/vbac038. eCollection 2022.

Abstract

Motivation: Despite arduous and time-consuming experimental efforts, protein-protein interactions (PPIs) for many pathogenic microbes with their human host are still unknown, limiting our understanding of the intricate interactions during infection and the identification of therapeutic targets. Since computational tools offer a promising alternative, we developed an R/Bioconductor package, HPiP (Host-Pathogen Interaction Prediction) software with a series of amino acid sequence property descriptors and an ensemble machine learning classifiers to predict the yet unmapped interactions between pathogen and host proteins.

Results: Using severe acute respiratory syndrome coronavirus 1 (SARS-CoV-1) or the novel SARS-CoV-2 coronavirus-human PPI training sets as a case study, we show that HPiP achieves a good performance with PPI predictions between SARS-CoV-2 and human proteins, which we confirmed experimentally in human monocyte THP-1 cells, and with several quality control metrics. HPiP also exhibited strong performance in accurately predicting the previously reported PPIs when tested against the sequences of pathogenic bacteria, Mycobacterium tuberculosis and human proteins. Collectively, our fully documented HPiP software will hasten the exploration of PPIs for a systems-level understanding of many understudied pathogens and uncover molecular targets for repurposing existing drugs.

Availability and implementation: HPiP is released as an open-source code under the MIT license that is freely available on GitHub (https://github.com/BabuLab-UofR/HPiP) as well as on Bioconductor (http://bioconductor.org/packages/devel/bioc/html/HPiP.html).

Supplementary information: Supplementary data are available at Bioinformatics Advances online.