An imConvNet-based deep learning model for Chinese medical named entity recognition

Yuchen Zheng; Zhenggong Han; Yimin Cai; Xubo Duan; Jiangling Sun; Wei Yang; Haisong Huang

doi:10.1186/s12911-022-02049-4

An imConvNet-based deep learning model for Chinese medical named entity recognition

BMC Med Inform Decis Mak. 2022 Nov 21;22(1):303. doi: 10.1186/s12911-022-02049-4.

Authors

Yuchen Zheng^#¹, Zhenggong Han^#², Yimin Cai¹, Xubo Duan¹, Jiangling Sun³, Wei Yang⁴, Haisong Huang⁵

Affiliations

¹ Medical College, Guizhou University, Guiyang, 550025, Guizhou, China.
² Key Laboratory of Advanced Manufacturing Technology, Ministry of Education, Guizhou University, Guiyang, 550025, Guizhou, China.
³ Guiyang Hospital of Stomatology, Guiyang, 550002, Guizhou, China.
⁴ Medical College, Guizhou University, Guiyang, 550025, Guizhou, China. vyang@gzu.edu.cn.
⁵ Key Laboratory of Advanced Manufacturing Technology, Ministry of Education, Guizhou University, Guiyang, 550025, Guizhou, China. hshuang@gzu.edu.cn.

^# Contributed equally.

Abstract

Background: With the development of current medical technology, information management becomes perfect in the medical field. Medical big data analysis is based on a large amount of medical and health data stored in the electronic medical system, such as electronic medical records and medical reports. How to fully exploit the resources of information included in these medical data has always been the subject of research by many scholars. The basis for text mining is named entity recognition (NER), which has its particularities in the medical field, where issues such as inadequate text resources and a large number of professional domain terms continue to face significant challenges in medical NER.

Methods: We improved the convolutional neural network model (imConvNet) to obtain additional text features. Concurrently, we continue to use the classical Bert pre-training model and BiLSTM model for named entity recognition. We use imConvNet model to extract additional word vector features and improve named entity recognition accuracy. The proposed model, named BERT-imConvNet-BiLSTM-CRF, is composed of four layers: BERT embedding layer-getting word embedding vector; imConvNet layer-capturing the context feature of each character; BiLSTM (Bidirectional Long Short-Term Memory) layer-capturing the long-distance dependencies; CRF (Conditional Random Field) layer-labeling characters based on their features and transfer rules.

Results: The average F1 score on the public medical data set yidu-s4k reached 91.38% when combined with the classical model; when real electronic medical record text in impacted wisdom teeth is used as the experimental object, the model's F1 score is 93.89%. They all show better results than classical models.

Conclusions: The suggested novel model (imConvNet) significantly improves the recognition accuracy of Chinese medical named entities and applies to various medical corpora.

Keywords: BERT; BiLSTM-CRF; Chinese electronic medical records; Convolutional neural network; Named entity recognition.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

China
Data Mining
Deep Learning*
Humans
Language
Names*

Grants and funding

Qiankehe support normal [2022] No.272/Natural Science Foundation of Guizhou Province