Egocentric Video Task Translation

Xue, Zihui
Song, Yale
Grauman, Kristen
Torresani, Lorenzo

Publication date

December 2022

Language

English

Abstract

Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e.g., classifying sports in one dataset, tracking animals in another). However, in wearable cameras, the immersive egocentric perspective of a person engaging with the world around them presents an interconnected web of video understanding tasks -- hand-object manipulations, navigation in the space, or human-human interactions -- that unfold continuously, driven by the person's goals. We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of th...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Egocentric Video Task Translation

Abstract

Extracted data

Egocentric Video Task Translation

Abstract

Extracted data

Related items

Related items