[Automatic labeling and extraction of terms in natural language processing in acupuncture clinical literature]

Zhongguo Zhen Jiu. 2022 Mar 12;42(3):327-31. doi: 10.13703/j.0255-2930.20211107-k0002.
[Article in Chinese]

Abstract

The paper analyzes the specificity of term recognition in acupuncture clinical literature and compares the advantages and disadvantages of three named entity recognition (NER) methods adopted in the field of traditional Chinese medicine. It is believed that the bi-directional long short-term memory networks-conditional random fields (Bi LSTM-CRF) may communicate the context information and complete NER by using less feature rules. This model is suitable for term recognition in acupuncture clinical literature. Based on this model, it is proposed that the process of term recognition in acupuncture clinical literature should include 4 aspects, i.e. literature pretreatment, sequence labeling, model training and effect evaluation, which provides an approach to the terminological structurization in acupuncture clinical literature.

分析针刺临床文献术语识别任务的特殊性,对比目前应用于中医药领域的3种命名实体识别(NER)方法的优缺点,认为双向长短期记忆神经网络-条件随机场模型(Bi LSTM-CRF)能结合上下文信息,利用较少的特征规律完成NER,适合针刺临床文献的术语识别。在此模型基础上,提出针刺临床文献术语识别流程主要包括文献预处理、序列标注、模型训练及效果评价4个方面,为针刺临床文献术语结构化提供思路。.

Keywords: Bi LSTM-CRF; acupuncture clinical literature; named entity recognition; term recognition.

MeSH terms

  • Acupuncture Therapy*
  • Electronic Health Records
  • Natural Language Processing*