Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natural Language Processing and Artificial Intelligence (AI). In conventional VQA, one may ask questions about an image which can be answered purely based on its content. For example, given an image with people in it, a typical VQA question may inquire about the number of people in the image. More recently, there is growing interest in answering questions which require commonsense knowledge involving common nouns (e.g., cats, dogs, microphones) present in the image. In spite of this progress, the important problem of answering questions requiring world knowledge about named entities (e.g., Barack Obama, White House, United Nations) in the image has...
Visual question answering on document images that contain textual, visual, and layout information, c...
This paper revisits visual representation in knowledge-based visual question answering (VQA) and dem...
We propose a method for visual question answering which combines an internal representation of the c...
International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new ...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to ...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Accurately answering a question about a given image requires combining observations with general kno...
We describe a method for visual question answering which is capable of reasoning about an image on t...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredic...
Knowledge-based visual question answering (QA) aims to answer a question which requires visually-gro...
Collaborative reasoning for knowledge-based visual question answering is challenging but vital and e...
Abstract—We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an i...
Visual question answering on document images that contain textual, visual, and layout information, c...
This paper revisits visual representation in knowledge-based visual question answering (VQA) and dem...
We propose a method for visual question answering which combines an internal representation of the c...
International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new ...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Knowledge-based visual question answering (VQA) is a vision-language task that requires an agent to ...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Accurately answering a question about a given image requires combining observations with general kno...
We describe a method for visual question answering which is capable of reasoning about an image on t...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredic...
Knowledge-based visual question answering (QA) aims to answer a question which requires visually-gro...
Collaborative reasoning for knowledge-based visual question answering is challenging but vital and e...
Abstract—We propose the task of free-form and open-ended Visual Question Answering (VQA). Given an i...
Visual question answering on document images that contain textual, visual, and layout information, c...
This paper revisits visual representation in knowledge-based visual question answering (VQA) and dem...
We propose a method for visual question answering which combines an internal representation of the c...