SemFunSim: a new method for measuring disease similarity by integrating semantic and gene functional association

PLoS One. 2014 Jun 16;9(6):e99415. doi: 10.1371/journal.pone.0099415. eCollection 2014.

Abstract

Background: Measuring similarity between diseases plays an important role in disease-related molecular function research. Functional associations between disease-related genes and semantic associations between diseases are often used to identify pairs of similar diseases from different perspectives. Currently, it is still a challenge to exploit both of them to calculate disease similarity. Therefore, a new method (SemFunSim) that integrates semantic and functional association is proposed to address the issue.

Methods: SemFunSim is designed as follows. First of all, FunSim (Functional similarity) is proposed to calculate disease similarity using disease-related gene sets in a weighted network of human gene function. Next, SemSim (Semantic Similarity) is devised to calculate disease similarity using the relationship between two diseases from Disease Ontology. Finally, FunSim and SemSim are integrated to measure disease similarity.

Results: The high average AUC (area under the receiver operating characteristic curve) (96.37%) shows that SemFunSim achieves a high true positive rate and a low false positive rate. 79 of the top 100 pairs of similar diseases identified by SemFunSim are annotated in the Comparative Toxicogenomics Database (CTD) as being targeted by the same therapeutic compounds, while other methods we compared could identify 35 or less such pairs among the top 100. Moreover, when using our method on diseases without annotated compounds in CTD, we could confirm many of our predicted candidate compounds from literature. This indicates that SemFunSim is an effective method for drug repositioning.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Area Under Curve
  • Benchmarking
  • Biological Ontologies
  • Computational Biology / methods*
  • Databases, Factual
  • Disease / classification*
  • Disease / genetics
  • Drug Discovery
  • Drug Repositioning / methods*
  • Drug Repositioning / statistics & numerical data
  • Gene Ontology
  • Gene Regulatory Networks*
  • Humans
  • ROC Curve
  • Semantics