Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition

AMIA Annu Symp Proc. 2018 Dec 5:2018:1110-1117. eCollection 2018.

Abstract

There has been an increasing interest in developing deep learning methods to recognize clinical concepts from narrative clinical text. Recently, several studies have reported that Recurrent Neural Networks (RNNs) outperformed traditional machine learning methods such as Conditional Random Fields (CRFs). Deep learning-based Named Entity Recognition (NER) systems often use statistical language models to learn word embeddings from unlabeled corpora. However, current word embedding methods have limitations to learn decent representations for low-frequency words. Medicine is a knowledge-extensive domain; existing medical knowledge has the potential to improve feature representations for less frequent yet important words. However, it is still not clear how existing medical knowledge can help deep learning models in clinical NER tasks. In this study, we integrated medical knowledge from the Unified Medical Language System with word embeddings trained from an unlabeled clinical corpus in RNNs for detection of problems, treatments and lab tests. We examined three different ways to generate medical knowledge features, including a dictionary lookup program, the KnowledgeMap system, and the MedLEE system. We also compared representing medical knowledge as one-hot vectors versus representing medical knowledge as embedding layers. The evaluation results showed that the RNN with medical knowledge as embedding layers achieved new state-of-the-art performance (a strict F1 score of 86.21% and a relaxed F1 score of 92.80%) on the 2010 i2b2 corpus, outperforming an RNN with only word embeddings and RNNs with medical knowledge as one-hot vectors. This study demonstrated an efficient way of integrating medical knowledge with distributed word representations for clinical NER.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Deep Learning*
  • Humans
  • Natural Language Processing
  • Neural Networks, Computer*
  • Unified Medical Language System