Collaborative reasoning for knowledge-based visual question answering is challenging but vital and efficient in understanding the features of the images and questions. While previous methods jointly fuse all kinds of features by attention mechanism or use handcrafted rules to generate a layout for performing compositional reasoning, which lacks the process of visual reasoning and introduces a large number of parameters for predicting the correct answer. For conducting visual reasoning on all kinds of image–question pairs, in this paper, we propose a novel reasoning model of a question-guided tree structure with a knowledge base (QGTSKB) for addressing these problems. In addition, our model consists of four neural module networks: the attent...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Visual question answering (VQA) demands simultaneous comprehension of both the image visual content ...
Vision-and-language tasks (such as answering a question about an image, grounding a referring expres...
We describe a method for visual question answering which is capable of reasoning about an image on t...
Accurately answering a question about a given image requires combining observations with general kno...
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to ...
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural la...
Knowledge-based visual question answering (QA) aims to answer a question which requires visually-gro...
International audienceAchieving artificial visual reasoning — the ability to answer image-related qu...
In visual reasoning, the achievement of deep learning significantly improved the accuracy of results...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natura...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Abstract Visual Question Answering (VQA) aims to output a correct answer based on cross‐modality inp...
CVPR2019 accepted paperInternational audienceMultimodal attentional networks are currently state-of-...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Visual question answering (VQA) demands simultaneous comprehension of both the image visual content ...
Vision-and-language tasks (such as answering a question about an image, grounding a referring expres...
We describe a method for visual question answering which is capable of reasoning about an image on t...
Accurately answering a question about a given image requires combining observations with general kno...
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to ...
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural la...
Knowledge-based visual question answering (QA) aims to answer a question which requires visually-gro...
International audienceAchieving artificial visual reasoning — the ability to answer image-related qu...
In visual reasoning, the achievement of deep learning significantly improved the accuracy of results...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natura...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Abstract Visual Question Answering (VQA) aims to output a correct answer based on cross‐modality inp...
CVPR2019 accepted paperInternational audienceMultimodal attentional networks are currently state-of-...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Visual question answering (VQA) demands simultaneous comprehension of both the image visual content ...
Vision-and-language tasks (such as answering a question about an image, grounding a referring expres...