Attention is a substantial mechanism for human to process massive data. It omits the trivial parts and focuses on the important ones. For example, we only need to remember the keywords in a long sentence and the principal objects in an image for rebuilding the sources. Therefore, it is crucial to building an attention network for artificial intelligence to solve the problem as human. This mechanism has been fully explored in the text-based tasks, such as language translation, reading comprehension, and sentimental analysis, as well as the visual-based tasks, such as image recognition, object detection, and action recognition. In this work, we explore the attention mechanism in the multi-modal tasks, which involve the inputs of both text...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
Visual Question Answering~(VQA) requires a simultaneous understanding of images and questions. Exist...
In this paper, we aim to obtain improved attention for a visual question answering (VQA) task. It is...
Attention mechanisms have been widely applied in the Visual Question Answering (VQA) task, as they h...
We propose a novel attention based deep learning ar-chitecture for visual question answering task (V...
© 2019 Association for Computational LinguisticsVisual dialog (VisDial) is a task which requires a d...
Visual question answering (VQA) is regarded as a multi-modal fine-grained feature fusion task, which...
Visual question answering (VQA) is a challenging problem in machine perception, which requires a dee...
CVPR2019 accepted paperInternational audienceMultimodal attentional networks are currently state-of-...
Recently, several deep learning models are proposed that operate on graph-structured data. These mod...
Visual Question Answering (VQA) is a recently proposed multimodal task in the general area of machin...
Visual Question Answering (VQA) is a task for evaluating image scene understanding abilities and sho...
© 2017 IEEE. Visual question answering (VQA) is challenging because it requires a simultaneous under...
This paper proposes to improve visual question answering (VQA) with structured representations of bo...
Visual Relational Reasoning is crucial for many vision-and-language based tasks, such as Visual Ques...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
Visual Question Answering~(VQA) requires a simultaneous understanding of images and questions. Exist...
In this paper, we aim to obtain improved attention for a visual question answering (VQA) task. It is...
Attention mechanisms have been widely applied in the Visual Question Answering (VQA) task, as they h...
We propose a novel attention based deep learning ar-chitecture for visual question answering task (V...
© 2019 Association for Computational LinguisticsVisual dialog (VisDial) is a task which requires a d...
Visual question answering (VQA) is regarded as a multi-modal fine-grained feature fusion task, which...
Visual question answering (VQA) is a challenging problem in machine perception, which requires a dee...
CVPR2019 accepted paperInternational audienceMultimodal attentional networks are currently state-of-...
Recently, several deep learning models are proposed that operate on graph-structured data. These mod...
Visual Question Answering (VQA) is a recently proposed multimodal task in the general area of machin...
Visual Question Answering (VQA) is a task for evaluating image scene understanding abilities and sho...
© 2017 IEEE. Visual question answering (VQA) is challenging because it requires a simultaneous under...
This paper proposes to improve visual question answering (VQA) with structured representations of bo...
Visual Relational Reasoning is crucial for many vision-and-language based tasks, such as Visual Ques...
Each time we ask for an object, describe a scene, follow directions or read a document containi...
Visual Question Answering~(VQA) requires a simultaneous understanding of images and questions. Exist...
In this paper, we aim to obtain improved attention for a visual question answering (VQA) task. It is...