Critical assessment and performance improvement of plant-pathogen protein-protein interaction prediction methods

Brief Bioinform. 2019 Jan 18;20(1):274-287. doi: 10.1093/bib/bbx123.

Abstract

The identification of plant-pathogen protein-protein interactions (PPIs) is an attractive and challenging research topic for deciphering the complex molecular mechanism of plant immunity and pathogen infection. Considering that the experimental identification of plant-pathogen PPIs is time-consuming and labor-intensive, computational methods are emerging as an important strategy to complement the experimental methods. In this work, we first evaluated the performance of traditional computational methods such as interolog, domain-domain interaction and domain-motif interaction in predicting known plant-pathogen PPIs. Owing to the low sensitivity of the traditional methods, we utilized Random Forest to build an inter-species PPI prediction model based on multiple sequence encodings and novel network attributes in the established plant PPI network. Critical assessment of the features demonstrated that the integration of sequence information and network attributes resulted in significant and robust performance improvement. Additionally, we also discussed the influence of Gene Ontology and gene expression information on the prediction performance. The Web server implementing the integrated prediction method, named InterSPPI, has been made freely available at http://systbio.cau.edu.cn/intersppi/index.php. InterSPPI could achieve a reasonably high accuracy with a precision of 73.8% and a recall of 76.6% in the independent test. To examine the applicability of InterSPPI, we also conducted cross-species and proteome-wide plant-pathogen PPI prediction tests. Taken together, we hope this work can provide a comprehensive understanding of the current status of plant-pathogen PPI predictions, and the proposed InterSPPI can become a useful tool to accelerate the exploration of plant-pathogen interactions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Arabidopsis / genetics
  • Arabidopsis / metabolism
  • Arabidopsis / microbiology
  • Arabidopsis Proteins / genetics
  • Arabidopsis Proteins / immunology
  • Arabidopsis Proteins / metabolism
  • Computational Biology / methods
  • Databases, Protein / statistics & numerical data
  • Gene Expression Profiling / statistics & numerical data
  • Gene Ontology
  • Host-Pathogen Interactions / genetics
  • Host-Pathogen Interactions / immunology
  • Machine Learning
  • Models, Biological
  • Plant Diseases / genetics
  • Plant Diseases / immunology
  • Plant Diseases / microbiology
  • Plant Immunity / genetics
  • Plant Proteins / genetics
  • Plant Proteins / immunology
  • Plant Proteins / metabolism*
  • Plants / genetics
  • Plants / metabolism*
  • Plants / microbiology*
  • Protein Interaction Mapping / methods*
  • Protein Interaction Mapping / statistics & numerical data

Substances

  • Arabidopsis Proteins
  • Plant Proteins