Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e.g., classifying sports in one dataset, tracking animals in another). However, in wearable cameras, the immersive egocentric perspective of a person engaging with the world around them presents an interconnected web of video understanding tasks -- hand-object manipulations, navigation in the space, or human-human interactions -- that unfold continuously, driven by the person's goals. We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of th...
We tackle the task of reconstructing hand-object interactions from short video clips. Given an input...
Wearable devices and affective computing have gained popularity in the recent times. Egocentric vide...
People often watch videos on the web to learn how to cook new recipes, assemble furniture or repair ...
In this report, we present our approach and empirical results of applying masked autoencoders in two...
To enable progress towards egocentric agents capable of understanding everyday tasks specified in na...
To enable progress towards egocentric agents capable of understanding everyday tasks specified in na...
Videos captured from wearable cameras, known as egocentric videos, create a continuous record of hum...
Procedure learning involves identifying the key-steps and determining their logical order to perform...
Estimating 3D human motion from an egocentric video sequence is critical to human behavior understan...
With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets su...
Human intention is a temporal sequence of human actions to achieve a goal. Determining human intent...
The topic of this dissertation is the analysis and understanding of egocentric (firstperson) videos...
We study how visual representations pre-trained on diverse human video data can enable data-efficien...
In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment Queries Challenge in ECCV 2...
Multi-modal datasets in artificial intelligence (AI) often capture a third-person perspective, but o...
We tackle the task of reconstructing hand-object interactions from short video clips. Given an input...
Wearable devices and affective computing have gained popularity in the recent times. Egocentric vide...
People often watch videos on the web to learn how to cook new recipes, assemble furniture or repair ...
In this report, we present our approach and empirical results of applying masked autoencoders in two...
To enable progress towards egocentric agents capable of understanding everyday tasks specified in na...
To enable progress towards egocentric agents capable of understanding everyday tasks specified in na...
Videos captured from wearable cameras, known as egocentric videos, create a continuous record of hum...
Procedure learning involves identifying the key-steps and determining their logical order to perform...
Estimating 3D human motion from an egocentric video sequence is critical to human behavior understan...
With the surge in attention to Egocentric Hand-Object Interaction (Ego-HOI), large-scale datasets su...
Human intention is a temporal sequence of human actions to achieve a goal. Determining human intent...
The topic of this dissertation is the analysis and understanding of egocentric (firstperson) videos...
We study how visual representations pre-trained on diverse human video data can enable data-efficien...
In this report, we present the ReLER@ZJU1 submission to the Ego4D Moment Queries Challenge in ECCV 2...
Multi-modal datasets in artificial intelligence (AI) often capture a third-person perspective, but o...
We tackle the task of reconstructing hand-object interactions from short video clips. Given an input...
Wearable devices and affective computing have gained popularity in the recent times. Egocentric vide...
People often watch videos on the web to learn how to cook new recipes, assemble furniture or repair ...