This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the question. CNN feature vectors cannot effectively capture situations as simple as multiple object instances, and LSTMs process questions as series of words, which do not reflect the true complexity of language structure. We instead propose to build graphs over the scene objects and over the question words, and we describe a deep neural network that exploits the structure in these...
We propose a novel attention based deep learning ar-chitecture for visual question answering task (V...
Visual Question Answering (VQA) raises a great challenge for computer vision and natural language pr...
In the past few years, Visual Question Answering (VQA) has seen immense progress both in terms of ac...
Visual question answering (VQA) is a challenging problem in machine perception, which requires a dee...
Visual Question Answering (VQA) is a stimulating process in the field of Natural Language Processing...
Recently, the Visual Question Answering (VQA) task has gained increasing attention in artificial int...
To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in visi...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
Computer Vision is a scientific discipline which involves the development of an algorithmic basis fo...
Visual question answering (VQA) demands simultaneous comprehension of both the image visual content ...
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural la...
© 2018 IEEE. Visual question answering (VQA) is challenging, because it requires a simultaneous unde...
Accurately answering a question about a given image requires combining observations with general kno...
In this paper, we propose a novel multi-modal framework for Scene Text Visual Question Answering (ST...
We propose a method for visual question answering which combines an internal representation of the c...
We propose a novel attention based deep learning ar-chitecture for visual question answering task (V...
Visual Question Answering (VQA) raises a great challenge for computer vision and natural language pr...
In the past few years, Visual Question Answering (VQA) has seen immense progress both in terms of ac...
Visual question answering (VQA) is a challenging problem in machine perception, which requires a dee...
Visual Question Answering (VQA) is a stimulating process in the field of Natural Language Processing...
Recently, the Visual Question Answering (VQA) task has gained increasing attention in artificial int...
To date, visual question answering (VQA) (i.e., image QA and video QA) is still a holy grail in visi...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
Computer Vision is a scientific discipline which involves the development of an algorithmic basis fo...
Visual question answering (VQA) demands simultaneous comprehension of both the image visual content ...
Many vision and language tasks require commonsense reasoning beyond data-driven image and natural la...
© 2018 IEEE. Visual question answering (VQA) is challenging, because it requires a simultaneous unde...
Accurately answering a question about a given image requires combining observations with general kno...
In this paper, we propose a novel multi-modal framework for Scene Text Visual Question Answering (ST...
We propose a method for visual question answering which combines an internal representation of the c...
We propose a novel attention based deep learning ar-chitecture for visual question answering task (V...
Visual Question Answering (VQA) raises a great challenge for computer vision and natural language pr...
In the past few years, Visual Question Answering (VQA) has seen immense progress both in terms of ac...