This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is extensively studied in traditional VQA, it is under-explored in knowledge-based VQA even though these two tasks share the common spirit, i.e., rely on visual input to answer the question. Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final an...
We propose a method for visual question answering which combines an internal representation of the c...
The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely...
Visual Question Answering~(VQA) requires a simultaneous understanding of images and questions. Exist...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natura...
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to ...
We describe a method for visual question answering which is capable of reasoning about an image on t...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Despite its importance for assessing the effectiveness of communicating information visually, fine-g...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new ...
Accurately answering a question about a given image requires combining observations with general kno...
Abstract Visual Question Answering (VQA) aims to output a correct answer based on cross‐modality inp...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
Collaborative reasoning for knowledge-based visual question answering is challenging but vital and e...
We propose a method for visual question answering which combines an internal representation of the c...
The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely...
Visual Question Answering~(VQA) requires a simultaneous understanding of images and questions. Exist...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natura...
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to ...
We describe a method for visual question answering which is capable of reasoning about an image on t...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Despite its importance for assessing the effectiveness of communicating information visually, fine-g...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new ...
Accurately answering a question about a given image requires combining observations with general kno...
Abstract Visual Question Answering (VQA) aims to output a correct answer based on cross‐modality inp...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
Collaborative reasoning for knowledge-based visual question answering is challenging but vital and e...
We propose a method for visual question answering which combines an internal representation of the c...
The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely...
Visual Question Answering~(VQA) requires a simultaneous understanding of images and questions. Exist...