The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for selfsupervised representation learning on videos. This learns a dense encoding of spatio-temporal blocks by recurrently predicting future representations; Second, we propose a curriculum training scheme to predict further into the future with progressively less temporal context. This encourages the model to only encode slowly varying spatialtemporal signals, therefore leading to semantic representations; Third, we evaluate the approach by first training the DPC model on the Kinetics-400 dataset with selfsupervi...
Current deep learning methods for action recognition rely heavily on large scale labeled video datas...
In this thesis the problem of automatic human action recognition and localization in videos is studi...
In recent research, the self-supervised video representation learning methods have achieved improve...
The objective of this paper is self-supervised learning from video, in particular for representation...
International audienceIn this paper, we propose a self-supervised method for video representation le...
This paper proposes a novel pretext task to address the self-supervised video representation learnin...
International audienceContrastive Predictive Coding (CPC) (van den Oord et al., 2018) has been succe...
We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich...
PredNet, a deep predictive coding network developed by Lotter et al., combines a biologically inspir...
While great strides have been made in using deep learning algorithms to solve supervised learning ta...
In this thesis, we investigate different representations and models for large-scale video understand...
Self-supervised video representation learning aimed at maximizing similarity between different tempo...
Recognizing actions is one of the important challenges in computer vision with respect to video data...
This paper instroduces an unsupervised framework to extract semantically rich features for video rep...
Spatio-temporal convolution often fails to learn motion dynamics in videos and thus an effective mot...
Current deep learning methods for action recognition rely heavily on large scale labeled video datas...
In this thesis the problem of automatic human action recognition and localization in videos is studi...
In recent research, the self-supervised video representation learning methods have achieved improve...
The objective of this paper is self-supervised learning from video, in particular for representation...
International audienceIn this paper, we propose a self-supervised method for video representation le...
This paper proposes a novel pretext task to address the self-supervised video representation learnin...
International audienceContrastive Predictive Coding (CPC) (van den Oord et al., 2018) has been succe...
We propose a novel self-supervised method, referred to as Video Cloze Procedure (VCP), to learn rich...
PredNet, a deep predictive coding network developed by Lotter et al., combines a biologically inspir...
While great strides have been made in using deep learning algorithms to solve supervised learning ta...
In this thesis, we investigate different representations and models for large-scale video understand...
Self-supervised video representation learning aimed at maximizing similarity between different tempo...
Recognizing actions is one of the important challenges in computer vision with respect to video data...
This paper instroduces an unsupervised framework to extract semantically rich features for video rep...
Spatio-temporal convolution often fails to learn motion dynamics in videos and thus an effective mot...
Current deep learning methods for action recognition rely heavily on large scale labeled video datas...
In this thesis the problem of automatic human action recognition and localization in videos is studi...
In recent research, the self-supervised video representation learning methods have achieved improve...