Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge in order to correctly answer a text question and associated image. Recent single modality text work has shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph embeddings, can improve performance on downstream entity-centric tasks. In this work, we empirically study how and whether such methods, applied in a bi-modal setting, can improve an existing VQA system's performance on the KBVQA task. We experiment with two large publicly available VQA datasets, (1) KVQA which contains mostly rare Wikipedia entities and (2) OKVQA which is less entity-centric and more aligned with common sense reason...
Knowledge-aware question answering (KAQA) requires the model to answer questions over a knowledge ba...
We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answer...
The fields of computer vision and natural language processing have made significant advances in visual...
International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new ...
Integrating outside knowledge for reasoning in visio-linguistic tasks such as visual question answer...
Given visual input and a natural language question about it, the visual question answering (VQA) tas...
The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely...
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natura...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Ques...
Accurately answering a question about a given image requires combining observations with general kno...
Visual question answering (VQA) is challenging not only because the model has to handle multi-modal ...
Knowledge-aware question answering (KAQA) requires the model to answer questions over a knowledge ba...
We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answer...
The fields of computer vision and natural language processing have made significant advances in visual...
International audienceWhether to retrieve, answer, translate, or reason, multimodality opens up new ...
Integrating outside knowledge for reasoning in visio-linguistic tasks such as visual question answer...
Given visual input and a natural language question about it, the visual question answering (VQA) tas...
The open-ended question answering task of Text-VQA often requires reading and reasoning about rarely...
Visual Question Answering (VQA) has emerged as an important problem spanning Computer Vision, Natura...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Humans have a remarkable capability to learn new concepts, process them in relation to their existin...
We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Ques...
Accurately answering a question about a given image requires combining observations with general kno...
Visual question answering (VQA) is challenging not only because the model has to handle multi-modal ...
Knowledge-aware question answering (KAQA) requires the model to answer questions over a knowledge ba...
We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answer...
The fields of computer vision and natural language processing have made significant advances in visual...