Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset

Jie Guo; Xiaomei Wu; Da-Yong Zhang; Kui Lin

doi:10.1093/nar/gkn016

Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset

Nucleic Acids Res. 2008 Apr;36(6):2002-11. doi: 10.1093/nar/gkn016. Epub 2008 Feb 14.

Authors

Jie Guo¹, Xiaomei Wu, Da-Yong Zhang, Kui Lin

Affiliation

¹ MOE Key Laboratory for Biodiversity Science and Ecological Engineering and College of Life Sciences, Beijing Normal University, Beijing 100875, China.

Abstract

High-throughput studies of protein interactions may have produced, experimentally and computationally, the most comprehensive protein-protein interaction datasets in the completely sequenced genomes. It provides us an opportunity on a proteome scale, to discover the underlying protein interaction patterns. Here, we propose an approach to discovering motif pairs at interaction sites (often 3-8 residues) that are essential for understanding protein functions and helpful for the rational design of protein engineering and folding experiments. A gold standard positive (interacting) dataset and a gold standard negative (non-interacting) dataset were mined to infer the interacting motif pairs that are significantly overrepresented in the positive dataset compared to the negative dataset. Four negative datasets assembled by different strategies were evaluated and the one with the best performance was used as the gold standard negatives for further analysis. Meanwhile, to assess the efficiency of our method in detecting potential interacting motif pairs, other approaches developed previously were compared, and we found that our method achieved the highest prediction accuracy. In addition, many uncharacterized motif pairs of interest were found to be functional with experimental evidence in other species. This investigation demonstrates the important effects of a high-quality negative dataset on the performance of such statistical inference.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Data Interpretation, Statistical
Databases, Protein
Genomics*
Protein Interaction Domains and Motifs*
Protein Interaction Mapping* / standards
Reference Standards
Yeasts / genetics*