A label information fused medical image report generation framework

Shuifa Sun; Zhoujunsen Mei; Xiaolong Li; Tinglong Tang; Zhanglin Su; Yirong Wu

doi:10.1016/j.artmed.2024.102823

A label information fused medical image report generation framework

Artif Intell Med. 2024 Apr:150:102823. doi: 10.1016/j.artmed.2024.102823. Epub 2024 Feb 22.

Authors

Shuifa Sun¹, Zhoujunsen Mei², Xiaolong Li³, Tinglong Tang², Zhanglin Su⁴, Yirong Wu⁵

Affiliations

¹ School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China; Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China.
² Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Computer and Information Technology, China Three Gorges University, Yichang, 443002, Hubei, China.
³ Yichang Key Laboratory of Intelligent Medicine, Yichang Key Laboratory of Intelligent Medicine, Yichang, 443002, Hubei, China; College of Economics and Management, China Three Gorges University, Yichang, 443002, Hubei, China.
⁴ School of Information Science and Technology, Hangzhou Normal University, Hangzhou, 311121, Zhejiang, China.
⁵ Institute of Advanced Studies in Humanities and Social Sciences, Beijing Normal University, Zhuhai, 519087, Guangdong, China. Electronic address: yrwu@bnu.edu.cn.

PMID: 38553163
DOI: 10.1016/j.artmed.2024.102823

Abstract

Medical imaging is an important tool for clinical diagnosis. Nevertheless, it is very time-consuming and error-prone for physicians to prepare imaging diagnosis reports. Therefore, it is necessary to develop some methods to generate medical imaging reports automatically. Currently, the task of medical imaging report generation is challenging in at least two aspects: (1) medical images are very similar to each other. The differences between normal and abnormal images and between different abnormal images are usually trivial; (2) unrelated or incorrect keywords describing abnormal findings in the generated reports lead to mis-communications. In this paper, we propose a medical image report generation framework composed of four modules, including a Transformer encoder, a MIX-MLP multi-label classification network, a co-attention mechanism (CAM) based semantic and visual feature fusion, and a hierarchical LSTM decoder. The Transformer encoder can be used to learn long-range dependencies between images and labels, effectively extract visual and semantic features of images, and establish long-term dependent relationships between visual and semantic information to accurately extract abnormal features from images. The MIX-MLP multi-label classification network, the co-attention mechanism and the hierarchical LSTM network can better identify abnormalities, achieving visual and text alignment fusion and multi-label diagnostic classification to better facilitate report generation. The results of the experiments performed on two widely used radiology report datasets, IU X-RAY and MIMIC-CXR, show that our proposed framework outperforms current report generation models in terms of both natural linguistic generation metrics and clinical efficacy assessment metrics. The code of this work is available online at https://github.com/watersunhznu/LIFMRG.

Keywords: Attention mechanism; Feature extraction; Medical image; Multi-modal feature fusion; Text generation.

MeSH terms

Communication*
Humans
Image Processing, Computer-Assisted
Learning
Linguistics
Physicians*
Semantics