Visual Question Answering (VQA) has witnessed tremendous progress in recent years. However, most efforts only focus on the 2D image question answering tasks. In this paper, we present the first attempt at extending VQA to the 3D domain, which can facilitate artificial intelligence's perception of 3D real-world scenarios. Different from image based VQA, 3D Question Answering (3DQA) takes the color point cloud as input and requires both appearance and 3D geometry comprehension ability to answer the 3D-related questions. To this end, we propose a novel transformer-based 3DQA framework "3DQA-TR", which consists of two encoders for exploiting the appearance and geometry information, respectively. The multi-modal information of appearance, geomet...
Humans have amazing visual perception which allows them to comprehend what the eyes see. In the core...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Recently, algorithms for object recognition and related tasks have become sufficiently proficient th...
We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the 3D-QA task, ...
Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D...
We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answeri...
Understanding visual question answering is going to be crucial for numerous human activities. Howeve...
There has been immense progress in the fields of computer vision, object detection and natural langu...
Using deep learning, computer vision now rivals people at object recognition and detection, opening ...
Given visual input and a natural language question about it, the visual question answering (VQA) tas...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
The task of visual question answering (VQA) is receiving increasing interest from researchers in bot...
In this dissertation, I propose and study a multi-modal Artificial Intelligence (AI) task called Vis...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
In recent years, visual question answering (VQA) has become topical. The premise of VQA's significan...
Humans have amazing visual perception which allows them to comprehend what the eyes see. In the core...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Recently, algorithms for object recognition and related tasks have become sufficiently proficient th...
We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the 3D-QA task, ...
Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D...
We propose a new task to benchmark scene understanding of embodied agents: Situated Question Answeri...
Understanding visual question answering is going to be crucial for numerous human activities. Howeve...
There has been immense progress in the fields of computer vision, object detection and natural langu...
Using deep learning, computer vision now rivals people at object recognition and detection, opening ...
Given visual input and a natural language question about it, the visual question answering (VQA) tas...
Visual Question Answering (VQA) is an extremely stimulating and challenging research area where Comp...
The task of visual question answering (VQA) is receiving increasing interest from researchers in bot...
In this dissertation, I propose and study a multi-modal Artificial Intelligence (AI) task called Vis...
Visual Question Answering (VQA) has attracted much attention in both computer vision and natural lan...
In recent years, visual question answering (VQA) has become topical. The premise of VQA's significan...
Humans have amazing visual perception which allows them to comprehend what the eyes see. In the core...
Visual Question Answering (VQA) is the task of answering questions based on an image. The field has ...
Recently, algorithms for object recognition and related tasks have become sufficiently proficient th...