Mapping SNOMED CT Codes to Semi-Structured Texts via an NLP Pipeline

Stud Health Technol Inform. 2022 Jun 29:295:390-393. doi: 10.3233/SHTI220747.

Abstract

In the project presented here, we used NLP tools for annotating German medical trainings documents with SNOMED CT codes. Following research question was addressed: Is it possible to automate the annotation of training documents with an NLP pipeline especially designed for this task but requiring translation into English? The goal of our stakeholder, an institution responsible for the continuing education of physicians, was to facilitate the switch between different medical trainings programs by coding the same requirement with the same SNOMED CT code, even if the wording is different. We first describe how we chose the concrete NLP tools, after which the concrete steps for implementing our prototype are outlined: the NLP pipeline construction, the implementation, and the validation. We infer three important lessons from our results: (i) self-supervision is no free lunch and should be based on a sophisticated task, (ii) the translation via DeepL can be too context-dependent for a peculiar use case, and (iii) ontology extraction can increase efficiency as well as accuracy.

Keywords: MedCAT; NLP; SNOMED CT; spaCy; word embeddings.

MeSH terms

  • Systematized Nomenclature of Medicine*