Automatic ICD code assignment of Chinese clinical notes based on multilayer attention BiRNN

J Biomed Inform. 2019 Mar:91:103114. doi: 10.1016/j.jbi.2019.103114. Epub 2019 Feb 12.

Abstract

International Classification of Diseases (ICD) code is an important label of electronic health record. The automatic ICD code assignment based on the narrative of clinical documents is an essential task which has drawn much attention recently. When Chinese clinical notes are the input corpus, the nature of Chinese brings some issues that need to be considered, such as the accuracy of word segmentation and the representation of single Chinese characters which contain semantics. Taking the lengthy text of patient notes and the representation of Chinese words into account, we present a multilayer attention bidirectional recurrent neural network (MA-BiRNN) model to implement the assignment of disease codes. A hierarchical approach is used to represent the feature of discharge summaries without manual feature engineering. The combination of character level embedding and word level embedding can improve the representation of words. Attention mechanism is introduced into bidirectional long short term memory networks, which helps to solve the performance dropping problem when plain recurrent neural networks encounter long text sequences. The experiment is carried out on a real-world dataset containing 7732 admission records in Chinese and 1177 unique ICD-10 labels. The proposed model achieves 0.639 and 0.766 in F1-score on full-level code and block-level code, respectively. It outperforms the baseline neural network models and achieves the lowest Hamming loss value. Ablation analysis indicates that the multilevel attention mechanism plays a decisive role in the system for dealing with Chinese clinical notes.

Keywords: Character-enhanced; Clinical notes; ICD code; Multilayer attention.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Automation
  • China
  • Datasets as Topic
  • Electronic Health Records*
  • International Classification of Diseases*
  • Machine Learning