Visual question generation for explicit questioning purposes based on target objects

Neural Netw. 2023 Oct:167:638-647. doi: 10.1016/j.neunet.2023.08.007. Epub 2023 Aug 24.

Abstract

Visual question generation aims to focus on some target objects in an image to generate questions with certain questioning purposes. Existing studies mainly utilize an answer to extract the target object corresponding to the questioning purpose for questioning. However, answers fail to accurately and completely map to every target object, such as the objects corresponding to the answer are ambiguous or the answers are the relationship between multiple objects. To address this problem, we propose a content-controlled question generation model, which generates questions based on a given target object set specified from an image. Considering that the target objects have different contributions during the generation process, we design a recurrent generative architecture to explicitly control attention to different objects and their corresponding image information at each generative stage. Extensive experiments on the VQA v2.0 dataset and the Visual7w dataset show that the proposed model outperforms the state-of-the-art models and can controllably generate questions with specified content.

Keywords: Questioning purposes; Target object; Visual question generation.