A hybrid approach for named entity recognition in Chinese electronic medical record

BMC Med Inform Decis Mak. 2019 Apr 9;19(Suppl 2):64. doi: 10.1186/s12911-019-0767-2.

Abstract

Background: With the rapid spread of electronic medical records and the arrival of medical big data era, the application of natural language processing technology in biomedicine has become a hot research topic.

Methods: In this paper, firstly, BiLSTM-CRF model is applied to medical named entity recognition on Chinese electronic medical record. According to the characteristics of Chinese electronic medical records, obtain the low-dimensional word vector of each word in units of sentences. And then input the word vector to BiLSTM to realize automatic extraction of sentence features. And then CRF performs sentence-level word tagging. Secondly, attention mechanism is added between the BiLSTM and the CRF to construct Attention-BiLSTM-CRF model, which can leverage document-level information to alleviate tagging inconsistency. In addition, this paper proposes an entity auto-correct algorithm to rectify entities according to historical entity information. At last, a drug dictionary and post-processing rules are well-built to rectify entities, to further improve performance.

Results: The final F1 scores of the BiLSTM-CRF and Attention-BiLSTM-CRF model on given test dataset are 90.15 and 90.82% respectively, both of which are higher than 89.26%, which is the best F1 score on the test dataset except ours.

Conclusion: Our approach can be used to recognize medical named entity on Chinese electronic medical records and achieves the state-of-the-art performance on the given test dataset.

Keywords: Attention; BiLSTM-CRF; Chinese electronic medical record; Drug dictionary; Named entity recognition.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • China
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval*
  • Language
  • Natural Language Processing*