The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, showing that this form of supervised contrastive learning leads to a clear improvement in performance; (ii) we propose a novel self-supervised co-training scheme to improve the popular infoNCE loss, exploiting the complementary information from different views, RGB streams and optical flow, of the same data source by using one view to obtain positive class samples for the other; (iii) we thoroughly evaluate the quality of the learnt representation on two different downstream tasks...
Contrastive Learning has recently received interest due to its success in self-supervised representa...
This paper proposes a novel pretext task to address the self-supervised video representation learnin...
The dominant paradigm for learning video-text representations – noise contrastive learning – increas...
With the rapid advancement of deep learning techniques in computer vision, researchers have achieved...
This thesis presents a novel self-supervised approach of learning visual representations from videos...
The remarkable success of deep learning in various domains relies on the availability of large-scale...
International audienceIn this paper, we propose a self-supervised method for video representation le...
In the image domain, excellent representations can be learned by inducing invariance to content-pres...
In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled vid...
International audienceContrastive representation learning has proven to be an effective self-supervi...
The objective of this paper is self-supervised learning from video, in particular for representation...
Self-supervised learning in video involves learning representations without using high-cost labels, ...
Abstract. Recognizing visual scenes and activities is challenging: often visual cues alone are ambig...
We explored the possibility of improving cross-view matching performance with self-supervised learni...
We propose a self-supervised learning approach for videos that learns representations of both the RG...
Contrastive Learning has recently received interest due to its success in self-supervised representa...
This paper proposes a novel pretext task to address the self-supervised video representation learnin...
The dominant paradigm for learning video-text representations – noise contrastive learning – increas...
With the rapid advancement of deep learning techniques in computer vision, researchers have achieved...
This thesis presents a novel self-supervised approach of learning visual representations from videos...
The remarkable success of deep learning in various domains relies on the availability of large-scale...
International audienceIn this paper, we propose a self-supervised method for video representation le...
In the image domain, excellent representations can be learned by inducing invariance to content-pres...
In this paper, we focus on the self-supervised learning of visual correspondence using unlabeled vid...
International audienceContrastive representation learning has proven to be an effective self-supervi...
The objective of this paper is self-supervised learning from video, in particular for representation...
Self-supervised learning in video involves learning representations without using high-cost labels, ...
Abstract. Recognizing visual scenes and activities is challenging: often visual cues alone are ambig...
We explored the possibility of improving cross-view matching performance with self-supervised learni...
We propose a self-supervised learning approach for videos that learns representations of both the RG...
Contrastive Learning has recently received interest due to its success in self-supervised representa...
This paper proposes a novel pretext task to address the self-supervised video representation learnin...
The dominant paradigm for learning video-text representations – noise contrastive learning – increas...