A label information fused medical image report generation framework

Artif Intell Med. 2024 Apr:150:102823. doi: 10.1016/j.artmed.2024.102823. Epub 2024 Feb 22.

Abstract

Medical imaging is an important tool for clinical diagnosis. Nevertheless, it is very time-consuming and error-prone for physicians to prepare imaging diagnosis reports. Therefore, it is necessary to develop some methods to generate medical imaging reports automatically. Currently, the task of medical imaging report generation is challenging in at least two aspects: (1) medical images are very similar to each other. The differences between normal and abnormal images and between different abnormal images are usually trivial; (2) unrelated or incorrect keywords describing abnormal findings in the generated reports lead to mis-communications. In this paper, we propose a medical image report generation framework composed of four modules, including a Transformer encoder, a MIX-MLP multi-label classification network, a co-attention mechanism (CAM) based semantic and visual feature fusion, and a hierarchical LSTM decoder. The Transformer encoder can be used to learn long-range dependencies between images and labels, effectively extract visual and semantic features of images, and establish long-term dependent relationships between visual and semantic information to accurately extract abnormal features from images. The MIX-MLP multi-label classification network, the co-attention mechanism and the hierarchical LSTM network can better identify abnormalities, achieving visual and text alignment fusion and multi-label diagnostic classification to better facilitate report generation. The results of the experiments performed on two widely used radiology report datasets, IU X-RAY and MIMIC-CXR, show that our proposed framework outperforms current report generation models in terms of both natural linguistic generation metrics and clinical efficacy assessment metrics. The code of this work is available online at https://github.com/watersunhznu/LIFMRG.

Keywords: Attention mechanism; Feature extraction; Medical image; Multi-modal feature fusion; Text generation.

MeSH terms

  • Communication*
  • Humans
  • Image Processing, Computer-Assisted
  • Learning
  • Linguistics
  • Physicians*
  • Semantics