Chinese clinical named entity recognition with radical-level feature and self-attention mechanism

J Biomed Inform. 2019 Oct:98:103289. doi: 10.1016/j.jbi.2019.103289. Epub 2019 Sep 18.

Abstract

Named entity recognition is a fundamental and crucial task in medical natural language processing problems. In medical fields, Chinese clinical named entity recognition identifies boundaries and types of medical entities from unstructured text such as electronic medical records. Recently, a composition model of bidirectional Long Short-term Memory Networks (BiLSTMs) and conditional random field (BiLSTM-CRF) based character-level semantics has achieved great success in Chinese clinical named entity recognition tasks. But this method can only capture contextual semantics between characters in sentences. However, Chinese characters are hieroglyphics, and deeper semantic information is hidden inside, the BiLSTM-CRF model failed to get this information. In addition, some of the entities in the sentence are dependent, but the Long Short-term Memory (LSTM) does not capture long-term dependencies perfectly between characters. So we propose a BiLSTM-CRF model based on the radical-level feature and self-attention mechanism to solve these problems. We use the convolutional neural network (CNN) to extract radical-level features, aims to capture the intrinsic and internal relevances of characters. In addition, we use self-attention mechanism to capture the dependency between characters regardless of their distance. Experiments show that our model achieves F1-score 93.00% and 86.34% on CCKS-2017 and TP_CNER dataset respectively.

Keywords: Chinese clinical named entity recognition; Radical-level feature; Self-attention.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Attention
  • China
  • Electronic Health Records*
  • Humans
  • Language
  • Medical Informatics / methods
  • Natural Language Processing*
  • Neural Networks, Computer*
  • Pattern Recognition, Automated
  • Semantics
  • Text Messaging