Contrastive learning has shown promising potential in self-supervised spatio-temporal representation learning. Most works naively sample different clips to construct positive and negative pairs. However, we observe that this formulation inclines the model towards the background scene bias. The underlying reasons are twofold. First, the scene difference is usually more noticeable and easier to discriminate than the motion difference. Second, the clips sampled from the same video often share similar backgrounds but have distinct motions. Simply regarding them as positive pairs will draw the model to the static background rather than the motion pattern. To tackle this challenge, this paper presents a novel dual contrastive formulation. Concret...
Video representation learning has been successful in video-text pre-training for zero-shot transfer,...
Self-supervised learning has demonstrated remarkable capability in representation learning for skele...
International audienceIn this paper, we propose a self-supervised method for video representation le...
We present ConCur, a contrastive video representation learning method that uses curriculum learning ...
As the most essential property in a video, motion information is critical to a robust and generalize...
We propose a self-supervised learning approach for videos that learns representations of both the RG...
We present MaCLR, a novel method to explicitly perform cross-modal self-supervised video representat...
In low-level video analyses, effective representations are important to derive the correspondences b...
Recent self-supervised video representation learning methods focus on maximizing the similarity betw...
The objective of this paper is visual-only self-supervised video representation learning. We make th...
The quality of the image representations obtained from self-supervised learning depends strongly on ...
Self-supervised skeleton-based action recognition with contrastive learning has attracted much atten...
Recent advances in supervised deep learning methods are enabling remote measurements of photoplethys...
Contrastive Learning has recently received interest due to its success in self-supervised representa...
Learning time-series representations when only unlabeled data or few labeled samples are available c...
Video representation learning has been successful in video-text pre-training for zero-shot transfer,...
Self-supervised learning has demonstrated remarkable capability in representation learning for skele...
International audienceIn this paper, we propose a self-supervised method for video representation le...
We present ConCur, a contrastive video representation learning method that uses curriculum learning ...
As the most essential property in a video, motion information is critical to a robust and generalize...
We propose a self-supervised learning approach for videos that learns representations of both the RG...
We present MaCLR, a novel method to explicitly perform cross-modal self-supervised video representat...
In low-level video analyses, effective representations are important to derive the correspondences b...
Recent self-supervised video representation learning methods focus on maximizing the similarity betw...
The objective of this paper is visual-only self-supervised video representation learning. We make th...
The quality of the image representations obtained from self-supervised learning depends strongly on ...
Self-supervised skeleton-based action recognition with contrastive learning has attracted much atten...
Recent advances in supervised deep learning methods are enabling remote measurements of photoplethys...
Contrastive Learning has recently received interest due to its success in self-supervised representa...
Learning time-series representations when only unlabeled data or few labeled samples are available c...
Video representation learning has been successful in video-text pre-training for zero-shot transfer,...
Self-supervised learning has demonstrated remarkable capability in representation learning for skele...
International audienceIn this paper, we propose a self-supervised method for video representation le...