A simple neural vector space model for medical concept normalization using concept embeddings

J Biomed Inform. 2022 Jun:130:104080. doi: 10.1016/j.jbi.2022.104080. Epub 2022 Apr 23.

Abstract

Objective: Medical concept normalization (MCN), the task of linking textual mentions to concepts in an ontology, provides a solution to unify different ways of referring to the same concept. In this paper, we present a simple neural MCN model that takes mentions as input and directly predicts concepts.

Materials and methods: We evaluate our proposed model on clinical datasets from ShARe/CLEF eHealth 2013 shared task and 2019 n2c2/OHNLP shared task track 3. Our neural MCN model consists of an encoder, and a normalized temperature-scaled softmax (NT-softmax) layer that maximizes the cosine similarity score of matching the mention to the correct concept. We adopt SAPBERT as the encoder and initialize the weights in the NT-softmax layer with pre-computed concept embeddings from SAPBERT.

Results: Our proposed neural model achieves competitive performance on ShARe/CLEF 2013 and establishes a new state-of-the-art on 2019-n2c2-MCN. Yet this model is simpler than most prior work: it requires no complex pipelines, no hand-crafted rules, and no preprocessing, making it simpler to apply in new settings.

Discussion: Analyses of our proposed model show that the NT-softmax is better than the conventional softmax on the MCN task, and both the CUI-less threshold parameter and the initialization of the weight vectors in the NT-softmax layer contribute to the improvements.

Conclusion: We propose a simple neural model for clinical MCN, an one-step approach with simpler inference and more effective performance than prior work. Our analyses demonstrate future work on MCN may require more effort on unseen concepts.

Keywords: Deep Learning; Medical Concept Normalization; Natural Language Processing; Normalized Temperature-scaled Softmax; Vector Space Model.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Space Simulation*