Prediction of human-Streptococcus pneumoniae protein-protein interactions using logistic regression

Comput Biol Chem. 2021 Jun:92:107492. doi: 10.1016/j.compbiolchem.2021.107492. Epub 2021 Apr 24.

Abstract

Streptococcus pneumoniae is a major cause of mortality in children under five years old. In recent years, the emergence of antibiotic-resistant strains of S. pneumoniae increases the threat level of this pathogen. For that reason, the exploration of S. pneumoniae protein virulence factors should be considered in developing new drugs or vaccines, for instance by the analysis of host-pathogen protein-protein interactions (HP-PPIs). In this research, prediction of protein-protein interactions was performed with a logistic regression model with the number of protein domain occurrences as features. By utilizing HP-PPIs of three different pathogens as training data, the model achieved 57-77 % precision, 64-75 % recall, and 96-98 % specificity. Prediction of human-S. pneumoniae protein-protein interactions using the model yielded 5823 interactions involving thirty S. pneumoniae proteins and 324 human proteins. Pathway enrichment analysis showed that most of the pathways involved in the predicted interactions are immune system pathways. Network topology analysis revealed β-galactosidase (BgaA) as the most central among the S. pneumoniae proteins in the predicted HP-PPI networks, with a degree centrality of 1.0 and a betweenness centrality of 0.451853. Further experimental studies are required to validate the predicted interactions and examine their roles in S. pneumoniae infection.

Keywords: Host-pathogen protein-protein interactions; Logistic regression; Network centrality; Pathway enrichment; Streptococcus pneumoniae.

MeSH terms

  • Host-Pathogen Interactions
  • Humans
  • Logistic Models
  • Protein Binding
  • Proteins / chemistry*
  • Streptococcus pneumoniae / chemistry*

Substances

  • Proteins