Choosing negative examples for the prediction of protein-protein interactions

Asa Ben-Hur; William Stafford Noble

doi:10.1186/1471-2105-7-S1-S2

Choosing negative examples for the prediction of protein-protein interactions

BMC Bioinformatics. 2006 Mar 20;7 Suppl 1(Suppl 1):S2. doi: 10.1186/1471-2105-7-S1-S2.

Authors

Asa Ben-Hur¹, William Stafford Noble

Affiliation

¹ Department of Computer Science, Colorado State University, Fort Collins CO, USA. asa@cs.colostate.edu

Abstract

The protein-protein interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. This need has prompted the development of a number of methods for predicting protein-protein interactions based on various sources of data and methodologies. The common method for choosing negative examples for training a predictor of protein-protein interactions is based on annotations of cellular localization, and the observation that pairs of proteins that have different localization patterns are unlikely to interact. While this method leads to high quality sets of non-interacting proteins, we find that this choice can lead to biased estimates of prediction accuracy, because the constraints placed on the distribution of the negative examples makes the task easier. The effects of this bias are demonstrated in the context of both sequence-based and non-sequence based features used for predicting protein-protein interactions.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Binding Sites
Computational Biology / methods*
Databases, Protein
Molecular Conformation
Oligonucleotide Array Sequence Analysis
Phosphorylation
Protein Folding
Protein Interaction Mapping*
Proteins / chemistry
Proteomics
ROC Curve
Sequence Alignment
Software

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding