Concept embedding to measure semantic relatedness for biomedical information ontologies

J Biomed Inform. 2019 Jun:94:103182. doi: 10.1016/j.jbi.2019.103182. Epub 2019 Apr 19.

Abstract

There have been many attempts to identify relationships among concepts corresponding to terms from biomedical information ontologies such as the Unified Medical Language System (UMLS). In particular, vector representation of such concepts using information from UMLS definition texts is widely used to measure the relatedness between two biological concepts. However, conventional relatedness measures have a limited range of applicable word coverage, which limits the performance of these models. In this paper, we propose a concept-embedding model of a UMLS semantic relatedness measure to overcome the limitations of earlier models. We obtained context texts of biological concepts that are not defined in UMLS by utilizing Wikipedia as an external knowledgebase. Concept vector representations were then derived from the context texts of the biological concepts. The degree of relatedness between two concepts was defined as the cosine similarity between corresponding concept vectors. As a result, we validated that our method provides higher coverage and better performance than the conventional method.

Keywords: Embedding; NLP; Paragraph vector; Similarity; UMLS; Wikipedia.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Ontologies*
  • Humans
  • Natural Language Processing
  • Semantics*
  • Unified Medical Language System