Surgical procedure long terms recognition from Chinese literature incorporating structural feature

Heliyon. 2022 Oct 29;8(11):e11291. doi: 10.1016/j.heliyon.2022.e11291. eCollection 2022 Nov.

Abstract

With rapid development of technologies in medical diagnosis and treatment, the novel and complicated concepts and usages of clinical terms especially of surgical procedures have become common in daily routine. Expected to be performed in an operating room and accompanied by an incision based on expert discretion, surgical procedures imply clinical understanding of diagnosis, examination, testing, equipment, drugs and symptoms, etc., but terms expressing surgical procedures are difficult to recognize since the terms are highly distinctive due to long morphological length and complex linguistics phenomena. To achieve higher recognition performance and overcome the challenge of the absence of natural delimiters in Chinese sentences, we propose a Named Entity Recognition (NER) model named Structural-SoftLexicon-Bi-LSTM-CRF (SSBC) empowered by pre-trained model BERT. In particular, we pre-trained a lexicon embedding over large-scale medical corpus to better leverage domain-specific structural knowledge. With input additionally augmented by BERT, rich multigranular information and structural term information is transferred from Structural-SoftLexicon to downstream model Bi-LSTM-CRF. Therefore, we could get a global optimal prediction of input sequence. We evaluate our model on a self-built corpus and results show that SSBC with pre-trained model outperforms other state-of-the-art benchmarks, surpassing at most 3.77% in F1 score. This study hopefully would benefit Diagnostic Related Groups (DRGs) and Diagnosis Intervention Package (DIP) grouping system, medical records statistics and analysis, Medicare payment system, etc.

Keywords: Long term recognition; Natural language processing; SoftLexicon; Surgical procedure.