Using data fusion for scoring reliability of protein-protein interactions

J Bioinform Comput Biol. 2014 Aug;12(4):1450014. doi: 10.1142/S0219720014500140. Epub 2014 Jul 2.

Abstract

Protein-protein interactions (PPIs) are important for understanding the cellular mechanisms of biological functions, but the reliability of PPIs extracted by high-throughput assays is known to be low. To address this, many current methods use multiple evidence from different sources of information to compute reliability scores for such PPIs. However, they often combine the evidence without taking into account the uncertainty of the evidence values, potential dependencies between the information sources used and missing values from some information sources. We propose to formulate the task of scoring PPIs using multiple information sources as a multi-criteria decision making problem that can be solved using data fusion to model potential interactions between the multiple information sources. Using data fusion, the amount of contribution from each information source can be proportioned accordingly to systematically score the reliability of PPIs. Our experimental results showed that the reliability scores assigned by our data fusion method can effectively classify highly reliable PPIs from multiple information sources, with substantial improvement in scoring over conventional approach such as the Adjust CD-Distance approach. In addition, the underlying interactions between the information sources used, as well as their relative importance, can also be determined with our data fusion approach. We also showed that such knowledge can be used to effectively handle missing values from information sources.

Keywords: Choquet fuzzy integral; Protein–protein interaction; data fusion; missing information; reliability.

MeSH terms

  • Computational Biology / methods*
  • Decision Making, Computer-Assisted
  • Gene Expression
  • High-Throughput Screening Assays
  • Protein Interaction Mapping / methods*
  • Reproducibility of Results