VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

Cangea, Cătălina
Belilovsky, Eugene
Lio, Pietro
Courville, Aaron

Open PDF

Open link

Publication date

October 2019

DOI

10.17863/CAM.44469

Abstract

Embodied Question Answering (EQA) is a recently proposed task, where an agent is placed in a rich 3D environment and must act based solely on its egocentric input to answer a given question. The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world. However, initial advancements combining standard vision and language methods with imitation and reinforcement learning algorithms have shown EQA might be too complex and challenging for these techniques. In order to investigate the feasibility of EQA-type tasks, we build the VideoNavQA dataset that contains pairs of questions and videos generated in the House3D ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

Abstract

Extracted data

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering

Abstract

Extracted data

Related items

Related items