[Research on automatic generation of multimodal medical image reports based on memory driven]

Suxia Xing; Junze Fang; Zihan Ju; Zheng Guo; Yu Wang

doi:10.7507/1001-5515.202304001

[Research on automatic generation of multimodal medical image reports based on memory driven]

Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2024 Feb 25;41(1):60-69. doi: 10.7507/1001-5515.202304001.

[Article in Chinese]

Authors

Suxia Xing¹, Junze Fang¹, Zihan Ju¹, Zheng Guo¹, Yu Wang¹

Affiliation

¹ School of Artificial Intelligence, Beijng Technology and Business University, Beijng 100048, P. R. China.

Abstract
in English, Chinese

The task of automatic generation of medical image reports faces various challenges, such as diverse types of diseases and a lack of professionalism and fluency in report descriptions. To address these issues, this paper proposes a multimodal medical imaging report based on memory drive method (mMIRmd). Firstly, a hierarchical vision transformer using shifted windows (Swin-Transformer) is utilized to extract multi-perspective visual features of patient medical images, and semantic features of textual medical history information are extracted using bidirectional encoder representations from transformers (BERT). Subsequently, the visual and semantic features are integrated to enhance the model's ability to recognize different disease types. Furthermore, a medical text pre-trained word vector dictionary is employed to encode labels of visual features, thereby enhancing the professionalism of the generated reports. Finally, a memory driven module is introduced in the decoder, addressing long-distance dependencies in medical image data. This study is validated on the chest X-ray dataset collected at Indiana University (IU X-Ray) and the medical information mart for intensive care chest x-ray (MIMIC-CXR) released by the Massachusetts Institute of Technology and Massachusetts General Hospital. Experimental results indicate that the proposed method can better focus on the affected areas, improve the accuracy and fluency of report generation, and assist radiologists in quickly completing medical image report writing.

医学影像报告自动生成任务面临疾病类型多样、报告描述缺乏专业性和流畅性等多重挑战。为解决以上问题，本文提出一种基于记忆驱动的多模态医学影像报告自动生成方法（mMIRmd），首先使用基于移位窗口的层次视觉转换器（Swin-Transformer）提取患者医学影像的多视角视觉特征，通过基于转换器的双向编码模型（BERT）提取病史信息的语义特征，然后将多模态特征进行融合，提高模型对不同疾病类型的识别能力。其次，使用医学文本预训练的词向量词典对视觉特征标签进行编码，以提高生成报告的专业性。最后，在解码器中引入记忆驱动模块，解决医学影像数据中的长距离依赖关系。本研究在印第安纳大学收集的胸部X光数据集（IU X-Ray）和麻省理工学院联合马萨诸塞州总医院发布的重症监护X光医疗数据集（MIMIC-CXR）上进行验证。实验结果表明，本文所提方法能更好地关注患病区域，提高生成报告的准确性与流畅性，可以辅助放射科医生快速完成医学影像报告的撰写。.

Keywords: Automatic report generation; Medical imaging; Memory driven; Multimodal feature fusion.

Publication types

English Abstract

MeSH terms

Critical Care*
Electric Power Supplies*
Humans
Semantics
Technology

Grants and funding

国家自然科学基金项目（61671028）；北京市自然科学基金项目（KZ202110011015）

Abstract in English, Chinese

Publication types

MeSH terms

Grants and funding

Abstract
in English, Chinese