Applying and Improving a Publicly Available Medication NER Pipeline in a Clinical Cancer EMR

Stud Health Technol Inform. 2024 Jan 25:310:679-684. doi: 10.3233/SHTI231051.

Abstract

Clinical NLP can be applied to extract medication information from free-text notes in EMRs, using NER pipelines. Publicly available annotated data for clinical NLP are scarce, and research annotation budgets are often low. Fine-tuning pre-trained pipelines containing a Transformer layer can produce quality results with relatively small training corpora. We examine the transferability of a publicly available, pre-trained NER pipeline with a Transformer layer for medication targets. The pipeline performs poorly when directly validated but achieves an F1-score of 92% for drug names after fine-tuning with 1,565 annotated samples from a clinical cancer EMR - highlighting the benefits of the Transformer architecture in this setting. Performance was largely influenced by inconsistent annotation - reinforcing the need for innovative annotation processes in clinical NLP applications.

Keywords: Natural language processing; Transformers; electronic medical records; medications; named entity recognition.

MeSH terms

  • Budgets*
  • Drug Delivery Systems
  • Electric Power Supplies
  • Humans
  • Neoplasms* / drug therapy