Deep-learning-based automated terminology mapping in OMOP-CDM

J Am Med Inform Assoc. 2021 Jul 14;28(7):1489-1496. doi: 10.1093/jamia/ocab030.

Abstract

Objective: Accessing medical data from multiple institutions is difficult owing to the interinstitutional diversity of vocabularies. Standardization schemes, such as the common data model, have been proposed as solutions to this problem, but such schemes require expensive human supervision. This study aims to construct a trainable system that can automate the process of semantic interinstitutional code mapping.

Materials and methods: To automate mapping between source and target codes, we compute the embedding-based semantic similarity between corresponding descriptive sentences. We also implement a systematic approach for preparing training data for similarity computation. Experimental results are compared to traditional word-based mappings.

Results: The proposed model is compared against the state-of-the-art automated matching system, which is called Usagi, of the Observational Medical Outcomes Partnership common data model. By incorporating multiple negative training samples per positive sample, our semantic matching method significantly outperforms Usagi. Its matching accuracy is at least 10% greater than that of Usagi, and this trend is consistent across various top-k measurements.

Discussion: The proposed deep learning-based mapping approach outperforms previous simple word-level matching algorithms because it can account for contextual and semantic information. Additionally, we demonstrate that the manner in which negative training samples are selected significantly affects the overall performance of the system.

Conclusion: Incorporating the semantics of code descriptions more significantly increases matching accuracy compared to traditional text co-occurrence-based approaches. The negative training sample collection methodology is also an important component of the proposed trainable system that can be adopted in both present and future related systems.

Keywords: automated mapping; common data model; deep-learning; embedding; terminology mapping.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Deep Learning*
  • Humans
  • Language
  • Semantics