International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new challenges and perspectives. In this context, we are interested in answering questions about named entities grounded in a visual context using a Knowledge Base (KB). To benchmark this task, called KVQAE (Knowledge-based Visual Question Answering about named Entities), we provide ViQuAE, a dataset of 3.7K questions paired with images. This is the first KVQAE dataset to cover a wide range of entity types (e.g. persons, landmarks, and products). The dataset is annotated using a semi-automatic method. We also propose a KB composed of 1.5M Wikipedia articles paired with images. To set a baseline on the benchmark, we address KVQAE as a two-stage pr...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
We propose a method for visual question answering which combines an internal representation of the c...
International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new ...
International audienceIn the context of multimodal processing,we focus our work on Knowledge-based V...
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natura...
We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Ques...
Visual Question Answering is about answering questions about images. These questions mostly related ...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowle...
Accurately answering a question about a given image requires combining observations with general kno...
We describe a method for visual question answering which is capable of reasoning about an image on t...
Visual question answering (VQA) demands simultaneous comprehension of both the image visual content ...
This paper revisits visual representation in knowledge-based visual question answering (VQA) and dem...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
We propose a method for visual question answering which combines an internal representation of the c...
International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new ...
International audienceIn the context of multimodal processing,we focus our work on Knowledge-based V...
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natura...
We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Ques...
Visual Question Answering is about answering questions about images. These questions mostly related ...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowle...
Accurately answering a question about a given image requires combining observations with general kno...
We describe a method for visual question answering which is capable of reasoning about an image on t...
Visual question answering (VQA) demands simultaneous comprehension of both the image visual content ...
This paper revisits visual representation in knowledge-based visual question answering (VQA) and dem...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
We propose a method for visual question answering which combines an internal representation of the c...