Hierarchical medical image report adversarial generation with hybrid discriminator

Junsan Zhang; Ming Cheng; Qiaoqiao Cheng; Xiuxuan Shen; Yao Wan; Jie Zhu; Mengxuan Liu

doi:10.1016/j.artmed.2024.102846

Hierarchical medical image report adversarial generation with hybrid discriminator

Artif Intell Med. 2024 May:151:102846. doi: 10.1016/j.artmed.2024.102846. Epub 2024 Mar 21.

Authors

Junsan Zhang¹, Ming Cheng², Qiaoqiao Cheng³, Xiuxuan Shen⁴, Yao Wan⁵, Jie Zhu⁶, Mengxuan Liu⁷

Affiliations

¹ College of Computer Science and Technology, China University of Petroleum (East China), Qingdao City, China. Electronic address: zhangjunsan@upc.edu.cn.
² College of Computer Science and Technology, China University of Petroleum (East China), Qingdao City, China.
³ Qingdao Huanghai University, Qingdao, China.
⁴ Xidian University, Xi'an, China.
⁵ Huazhong University of Science and Technology, Wuhan City, China. Electronic address: wanyao@hust.edu.cn.
⁶ Department of Information Management, The National Police University for Criminal Justice, Baoding City, China.
⁷ The Third Hospital of Jinan, Ji'nan City, China.

PMID: 38547777
DOI: 10.1016/j.artmed.2024.102846

Abstract

Background and objectives: Generating coherent reports from medical images is an important task for reducing doctors' workload. Unlike traditional image captioning tasks, the task of medical image report generation faces more challenges. Current models for generating reports from medical images often fail to characterize some abnormal findings, and some models generate reports with low quality. In this study, we propose a model to generate high-quality reports from medical images.

Methods: In this paper, we propose a model called Hybrid Discriminator Generative Adversarial Network (HDGAN), which combines Generative Adversarial Network (GAN) with Reinforcement Learning (RL). The HDGAN model consists of a generator, a one-sentence discriminator, and a one-word discriminator. Specifically, the RL reward signals are judged on the one-sentence discriminator and one-word discriminator separately. The one-sentence discriminator can better learn sentence-level structural information, while the one-word discriminator can learn word diversity information effectively.

Results: Our approach performs better on the IU-X-ray and COV-CTR datasets than the baseline models. For the ROUGE metric, our method outperforms the state-of-the-art model by 0.36 on the IU-X-ray, 0.06 on the MIMIC-CXR and 0.156 on the COV-CTR.

Conclusions: The compositional framework we proposed can generate more accurate medical image reports at different levels.

Keywords: Generative adversarial network; Medical image report generation; Reinforcement learning.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Datasets as Topic
Deep Learning*
Diagnostic Imaging* / methods
Humans
Image Processing, Computer-Assisted* / methods
Neural Networks, Computer*
Radiography, Thoracic
Thorax / diagnostic imaging