Study on structured method of Chinese MRI report of nasopharyngeal carcinoma

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):203. doi: 10.1186/s12911-021-01547-1.

Abstract

Background: Image text is an important text data in the medical field at it can assist clinicians in making a diagnosis. However, due to the diversity of languages, most descriptions in the image text are unstructured data. The same medical phenomenon may also be described in various ways, such that it remains challenging to conduct text structure analysis. The aim of this research is to develop a feasible approach that can automatically convert nasopharyngeal cancer reports into structured text and build a knowledge network.

Methods: In this work, we compare commonly used named entity recognition (NER) models, choose the optimal model as our triplet extraction model, and present a Chinese structuring algorithm. Finally, we visualize the results of the algorithm in the form of a knowledge network of nasopharyngeal cancer.

Results: In NER, both accuracy and recall of the BERT-CRF model reached 99%. The structured extraction rate is 84.74%, and the accuracy is 89.39%. The architecture based on recurrent neural network does not rely on medical dictionaries or word segmentation tools and can realize triplet recognition.

Conclusions: The BERT-CRF model has high performance in NER, and the triplet can reflect the content of the image report. This work can provide technical support for the construction of a nasopharyngeal cancer database.

Keywords: Knowledge network; Named entity recognition; Structured medical text.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • China
  • Humans
  • Language*
  • Magnetic Resonance Imaging
  • Nasopharyngeal Carcinoma / diagnostic imaging
  • Nasopharyngeal Neoplasms* / diagnostic imaging