This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object de-tection and image segmentation, to predict answers to simple questions about im-ages. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also pre-sented.
Together with the development of more accurate methods in Computer Vision and Natural Language Under...
This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual M...
Recently, the Visual Question Answering (VQA) task has gained increasing attention in artificial int...
In this paper, we propose to employ the convolutional neural network (CNN) for the image question an...
We propose a method for visual question answering which combines an internal representation of the c...
Computer Vision is a scientific discipline which involves the development of an algorithmic basis fo...
Visual Question Answering (VQA) is a stimulating process in the field of Natural Language Processing...
Wearable cameras generate a large amount of photos which are, in many cases, useless or redundant. O...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredic...
Computer Vision has undergone major changes over the recent five years. Here, we investigate if the ...
There has been immense progress in the fields of computer vision, object detection and natural langu...
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural la...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
This paper proposes to improve visual question answering (VQA) with structured representations of bo...
Together with the development of more accurate methods in Computer Vision and Natural Language Under...
This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual M...
Recently, the Visual Question Answering (VQA) task has gained increasing attention in artificial int...
In this paper, we propose to employ the convolutional neural network (CNN) for the image question an...
We propose a method for visual question answering which combines an internal representation of the c...
Computer Vision is a scientific discipline which involves the development of an algorithmic basis fo...
Visual Question Answering (VQA) is a stimulating process in the field of Natural Language Processing...
Wearable cameras generate a large amount of photos which are, in many cases, useless or redundant. O...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredic...
Computer Vision has undergone major changes over the recent five years. Here, we investigate if the ...
There has been immense progress in the fields of computer vision, object detection and natural langu...
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural la...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
This paper proposes to improve visual question answering (VQA) with structured representations of bo...
Together with the development of more accurate methods in Computer Vision and Natural Language Under...
This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual M...
Recently, the Visual Question Answering (VQA) task has gained increasing attention in artificial int...