Integration of multiple terminology bases: a multi-view alignment method using the hierarchical structure

Bioinformatics. 2023 Nov 1;39(11):btad689. doi: 10.1093/bioinformatics/btad689.

Abstract

Motivation: In the medical field, multiple terminology bases coexist across different institutions and contexts, often resulting in the presence of redundant terms. The identification of overlapping terms among these bases holds significant potential for harmonizing multiple standards and establishing unified framework, which enhances user access to comprehensive and well-structured medical information. However, the majority of terminology bases exhibit differences not only in semantic aspects but also in the hierarchy of their classification systems. The conventional approaches that rely on neighborhood-based methods such as GCN may introduce errors due to the presence of different superordinate and subordinate terms. Therefore, it is imperative to explore novel methods to tackle this structural challenge.

Results: To address this heterogeneity issue, this paper proposes a multi-view alignment approach that incorporates the hierarchical structure of terminologies. We utilize BERT-based model to capture the recursive relationships among different levels of hierarchy and consider the interaction information of name, neighbors, and hierarchy between different terminologies. We test our method on mapping files of three medical open terminologies, and the experimental results demonstrate that our method outperforms baseline methods in terms of Hits@1 and Hits@10 metrics by 2%.

Availability and implementation: The source code will be available at https://github.com/Ulricab/Bert-Path upon publication.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Benchmarking
  • Reference Standards
  • Semantics
  • Software*
  • Vocabulary, Controlled*