A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge

BMC Bioinformatics. 2008 Sep 18:9:382. doi: 10.1186/1471-2105-9-382.

Abstract

Background: The functional characterization of newly discovered proteins has been a challenge in the post-genomic era. Protein-protein interactions provide insights into the functional analysis because the function of unknown proteins can be postulated on the basis of their interaction evidence with known proteins. The protein-protein interaction data sets have been enriched by high-throughput experimental methods. However, the functional analysis using the interaction data has a limitation in accuracy because of the presence of the false positive data experimentally generated and the interactions that are a lack of functional linkage.

Results: Protein-protein interaction data can be integrated with the functional knowledge existing in the Gene Ontology (GO) database. We apply similarity measures to assess the functional similarity between interacting proteins. We present a probabilistic framework for predicting functions of unknown proteins based on the functional similarity. We use the leave-one-out cross validation to compare the performance. The experimental results demonstrate that our algorithm performs better than other competing methods in terms of prediction accuracy. In particular, it handles the high false positive rates of current interaction data well.

Conclusion: The experimentally determined protein-protein interactions are erroneous to uncover the functional associations among proteins. The performance of function prediction for uncharacterized proteins can be enhanced by the integration of multiple data sources available.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Artificial Intelligence
  • Binding Sites
  • Computer Simulation
  • Data Interpretation, Statistical
  • Database Management Systems*
  • Databases, Protein*
  • Models, Biological*
  • Models, Chemical*
  • Models, Statistical
  • Protein Binding
  • Protein Interaction Mapping / methods*
  • Semantics
  • Structure-Activity Relationship
  • Systems Integration