A cluster-based approach for semantic similarity in the biomedical domain

Conf Proc IEEE Eng Med Biol Soc. 2006:2006:2713-7. doi: 10.1109/IEMBS.2006.259235.

Abstract

We propose a new cluster-based semantic similarity/distance measure for the biomedical domain within the framework of UMLS. The proposed measure is based mainly on the cross-modified path length feature between the concept nodes, and two new features: (1) the common specificity of two concept nodes, and (2) the local granularity of the clusters. We also applied, for comparison purpose, five existing general English ontology-based similarity measures into the biomedical domain within UMLS. The proposed measure was evaluated relative to human experts' ratings, and compared with the existing techniques using two ontologies (MeSH and SNOMED-CT) in UMLS. The experimental results confirmed the efficiency of the proposed method, and showed that our similarity measure gives the best overall results of correlation with human ratings. We show, further, that using MeSH ontology produces better semantic correlations with human experts' scores than SNOMED-CT in all of the tested measures.

MeSH terms

  • Abstracting and Indexing / methods*
  • Artificial Intelligence*
  • Cluster Analysis*
  • Natural Language Processing*
  • Pattern Recognition, Automated / methods*
  • Unified Medical Language System*
  • Vocabulary, Controlled*