Radiology report generation with a learned knowledge base and multi-modal alignment

Shuxin Yang; Xian Wu; Shen Ge; Zhuozhao Zheng; S Kevin Zhou; Li Xiao

doi:10.1016/j.media.2023.102798

Radiology report generation with a learned knowledge base and multi-modal alignment

Med Image Anal. 2023 May:86:102798. doi: 10.1016/j.media.2023.102798. Epub 2023 Mar 23.

Authors

Shuxin Yang¹, Xian Wu², Shen Ge², Zhuozhao Zheng³, S Kevin Zhou⁴, Li Xiao⁵

Affiliations

¹ Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology, CAS, Beijing, 100190, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
² Tencent Medical AI Lab, Beijing, 100094, China.
³ Department of Radiology, Beijing Tsinghua Changgung Hospital, Beijing, 102218, China; School of Medicine, Tsinghua University, Beijing, 100084, China.
⁴ Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology, CAS, Beijing, 100190, China; School of Biomedical Engineering Suzhou Institute for Advanced Research, Center for Medical Imaging, Robotics, and Analytic Computing & LEarning (MIRACLE), University of Science and Technology of China, Suzhou, 215123, China.
⁵ Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology, CAS, Beijing, 100190, China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, 100876, China; University of Chinese Academy of Sciences, Beijing, 100049, China. Electronic address: andrew.lxiao@gmail.com.

PMID: 36989850
DOI: 10.1016/j.media.2023.102798

Abstract

In clinics, a radiology report is crucial for guiding a patient's treatment. However, writing radiology reports is a heavy burden for radiologists. To this end, we present an automatic, multi-modal approach for report generation from a chest x-ray. Our approach, motivated by the observation that the descriptions in radiology reports are highly correlated with specific information of the x-ray images, features two distinct modules: (i) Learned knowledge base: To absorb the knowledge embedded in the radiology reports, we build a knowledge base that can automatically distill and restore medical knowledge from textual embedding without manual labor; (ii) Multi-modal alignment: to promote the semantic alignment among reports, disease labels, and images, we explicitly utilize textual embedding to guide the learning of the visual feature space. We evaluate the performance of the proposed model using metrics from both natural language generation and clinic efficacy on the public IU-Xray and MIMIC-CXR datasets. Our ablation study shows that each module contributes to improving the quality of generated reports. Furthermore, the assistance of both modules, our approach outperforms state-of-the-art methods over almost all the metrics. Code is available at https://github.com/LX-doctorAI1/M2KT.

Keywords: Knowledge base; Multi-modal alignment; Radiology report generation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking
Humans
Knowledge Bases
Learning
Radiography
Radiology*