From electronic health records to terminology base: A novel knowledge base enrichment approach

J Biomed Inform. 2021 Jan:113:103628. doi: 10.1016/j.jbi.2020.103628. Epub 2020 Nov 21.

Abstract

Enriching terminology base (TB) is an important and continuous process, since formal term can be renamed and new term alias emerges all the time. As a potential supplementary for TB enrichment, electronic health record (EHR) is a fundamental source for clinical research and practise. The task to align the set of external terms in EHRs to TB can be regarded as entity alignment without structure information. Conventional approaches mainly use internal structural information of multiple knowledge bases (KBs) to map entities and their counterparts among KBs. However, the external terms in EHRs are independent clinical terms, which lack of interrelations. To achieve entity alignment in this case, we proposed a novel automatic TB enrichment approach, named semantic & structure embeddings-based relevancy prediction (S2ERP). To obtain the semantic embedding of external terms, we fed them with formal entity into a pre-trained language model. Meanwhile, a graph convolutional network was used to obtain the structure embeddings of the synonyms and hyponyms in TB. Afterwards, S2ERP combines both embeddings to measure the relevancy. Experimental results on clinical indicator TB, collected from 38 top-class hospitals of Shanghai Hospital Development Center, showed that the proposed approach outperforms baseline methods by 14.16% in Hits@1.

Keywords: Entity alignment; Graph convolutional network; Knowledge base; Pre-trained language model; Terminology enriching.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China
  • Electronic Health Records*
  • Knowledge Bases*
  • Natural Language Processing
  • Semantics