A weakly supervised method for named entity recognition of Chinese electronic medical records

Med Biol Eng Comput. 2023 Oct;61(10):2733-2743. doi: 10.1007/s11517-023-02871-6. Epub 2023 Jul 15.

Abstract

The field of Chinese medical natural language processing faces a significant challenge in training accurate entity recognition models due to the limited availability of high-quality labeled data. In response, we propose a joint training model, MCBERT-GCN-CRF, which achieves high performance in identifying medical-related entities in Chinese electronic medical records. Additionally, we introduce CM-NER, a 5-step framework that effectively mitigates the effects of noise in weakly labeled data and establishes a principled connection between supervised and weakly supervised named entity recognition. We demonstrate significant improvements in recall rate and accuracy. Our approach outperforms traditional fully supervised pre-training models and other state-of-the-art methods by suppressing noise in weakly labeled data. Our proposed framework achieves an F1 score of 86.29% on the CCKS-2019 dataset, significantly higher than pre-trained model baselines ranging from 74.17 to 83.06%, and higher than the top-performing named entity recognition supervised learning models in the CCKS-2019 competition. Our results demonstrate the effectiveness of our proposed framework and highlight the potential of leveraging unlabeled data to train accurate models for named entity recognition in Chinese medical natural language processing. This research has significant implications for advancing natural language processing techniques in the medical domain and improving patient care.

Keywords: Electronic medical records; Named entity recognition; Natural language processing; Weakly-supervised learning.

MeSH terms

  • China
  • Electronic Health Records*
  • Humans
  • Language
  • Natural Language Processing*