K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering

Usman Naseem; Matloob Khushi; Adam G Dunn; Jinman Kim

doi:10.1109/JBHI.2023.3294249

K-PathVQA: Knowledge-Aware Multimodal Representation for Pathology Visual Question Answering

IEEE J Biomed Health Inform. 2023 Jul 11:PP. doi: 10.1109/JBHI.2023.3294249. Online ahead of print.

Authors

Usman Naseem, Matloob Khushi, Adam G Dunn, Jinman Kim

PMID: 37432797
DOI: 10.1109/JBHI.2023.3294249

Abstract

Pathology imaging is routinely used to detect the underlying effects and causes of diseases or injuries. Pathology visual question answering (PathVQA) aims to enable computers to answer questions about clinical visual findings from pathology images. Prior work on PathVQA has focused on directly analyzing the image content using conventional pretrained encoders without utilizing relevant external information when the image content is inadequate. In this paper, we present a knowledge-driven PathVQA (K-PathVQA), which uses a medical knowledge graph (KG) from a complementary external structured knowledge base to infer answers for the PathVQA task. K-PathVQA improves the question representation with external medical knowledge and then aggregates vision, language, and knowledge embeddings to learn a joint knowledge-image-question representation. Our experiments using a publicly available PathVQA dataset showed that our K-PathVQA outperformed the best baseline method with an increase of 4.15% in accuracy for the overall task, an increase of 4.40% in open-ended question type and an absolute increase of 1.03% in closed-ended question types. Ablation testing shows the impact of each of the contributions. Generalizability of the method is demonstrated with a separate medical VQA dataset.