Medical visual question answering based on question-type reasoning and semantic space constraint

Meiling Wang; Xiaohai He; Luping Liu; Linbo Qing; Honggang Chen; Yan Liu; Chao Ren

doi:10.1016/j.artmed.2022.102346

Medical visual question answering based on question-type reasoning and semantic space constraint

Artif Intell Med. 2022 Sep:131:102346. doi: 10.1016/j.artmed.2022.102346. Epub 2022 Jun 30.

Authors

Meiling Wang¹, Xiaohai He², Luping Liu¹, Linbo Qing¹, Honggang Chen¹, Yan Liu³, Chao Ren¹

Affiliations

¹ College of Electronics and Information Engineering, Sichuan University, Chengdu, Sichuan 610065, China.
² College of Electronics and Information Engineering, Sichuan University, Chengdu, Sichuan 610065, China. Electronic address: hxh@scu.edu.cn.
³ Department of Neurology, The Affiliated Hospital of Southwest Jiaotong university The Third People's Hospital of Chengdu, Sichuan, China.

PMID: 36100340
DOI: 10.1016/j.artmed.2022.102346

Abstract

Medical visual question answering (Med-VQA) aims to accurately answer clinical questions about medical images. Despite its enormous potential for application in the medical domain, the current technology is still in its infancy. Compared with general visual question answering task, Med-VQA task involve more demanding challenges. First, clinical questions about medical images are usually diverse due to different clinicians and the complexity of diseases. Consequently, noise is inevitably introduced when extracting question features. Second, Med-VQA task have always been regarded as a classification problem for predefined answers, ignoring the relationships between candidate responses. Thus, the Med-VQA model pays equal attention to all candidate answers when predicting answers. In this paper, a novel Med-VQA framework is proposed to alleviate the above-mentioned problems. Specifically, we employed a question-type reasoning module severally to closed-ended and open-ended questions, thereby extracting the important information contained in the questions through an attention mechanism and filtering the noise to extract more valuable question features. To take advantage of the relational information between answers, we designed a semantic constraint space to calculate the similarity between the answers and assign higher attention to answers with high correlation. To evaluate the effectiveness of the proposed method, extensive experiments were conducted on a public dataset, namely VQA-RAD. Experimental results showed that the proposed method achieved better performance compared to other the state-of-the-art methods. The overall accuracy, closed-ended accuracy, and open-ended accuracy reached 74.1 %, 82.7 %, and 60.9 %, respectively. It is worth noting that the absolute accuracy of the proposed method improved by 5.5 % for closed-ended questions.

Keywords: Attention mechanism; Medical visual question answering; Question-type reasoning; Semantic space constraint.

MeSH terms

Algorithms
Attention
Image Interpretation, Computer-Assisted / methods
Image Processing, Computer-Assisted / methods
Semantics*