Do You Need Embeddings Trained on a Massive Specialized Corpus for Your Clinical Natural Language Processing Task?

Stud Health Technol Inform. 2019 Aug 21:264:1558-1559. doi: 10.3233/SHTI190533.

Abstract

We explore the impact of data source on word representations for different NLP tasks in the clinical domain in French (natural language understanding and text classification). We compared word embeddings (Fasttext) and language models (ELMo), learned either on the general domain (Wikipedia) or on specialized data (electronic health records, EHR). The best results were obtained with ELMo representations learned on EHR data for one of the two tasks(+7% and +8% of gain in F1-score).

Keywords: Natural language processing; electronic health records.

MeSH terms

  • Electronic Health Records*
  • Histological Techniques
  • Language
  • Natural Language Processing*