The Vision Transformer architecture has shown to be competitive in the computer vision (CV) space where it has dethroned convolution-based networks in several benchmarks. Nevertheless, Convolutional Neural Networks (CNN) remain the preferential architecture for the representation module in Reinforcement Learning. In this work, we study pretraining a Vision Transformer using several state-of-the-art self-supervised methods and assess data-efficiency gains from this training framework. We propose a new self-supervised learning method called TOV-VICReg that extends VICReg to better capture temporal relations between observations by adding a temporal order verification task. Furthermore, we evaluate the resultant encoders with Atari games in a ...
Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in...
Recent unsupervised pre-training methods have shown to be effective on language and vision domains b...
177 pagesThe field of computer vision has benefited tremendously from an unusual blessing: a baselin...
Vision Transformers (ViT) have recently demonstrated the significant potential of transformer archit...
International audienceIn this paper, we question if self-supervised learning provides new properties...
This paper investigates two techniques for developing efficient self-supervised vision transformers ...
Methods to describe an image or video with natural language, namely image and video captioning, have...
Despite some successful applications of goal-driven navigation, existing deep reinforcement learning...
Recent advances in reinforcement learning enable computers to learn human level polices for Atari 26...
Learning representations with self-supervision for convolutional networks (CNN) has proven effective...
Creating a vision pipeline for different datasets to solve a computer vision task is a complex and t...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
Deep Reinforcement Learning (DRL) has gained much attention for solving robotic hand-eye coordinatio...
Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (A...
The vision transformer (ViT) has advanced to the cutting edge in the visual recognition task. Transf...
Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in...
Recent unsupervised pre-training methods have shown to be effective on language and vision domains b...
177 pagesThe field of computer vision has benefited tremendously from an unusual blessing: a baselin...
Vision Transformers (ViT) have recently demonstrated the significant potential of transformer archit...
International audienceIn this paper, we question if self-supervised learning provides new properties...
This paper investigates two techniques for developing efficient self-supervised vision transformers ...
Methods to describe an image or video with natural language, namely image and video captioning, have...
Despite some successful applications of goal-driven navigation, existing deep reinforcement learning...
Recent advances in reinforcement learning enable computers to learn human level polices for Atari 26...
Learning representations with self-supervision for convolutional networks (CNN) has proven effective...
Creating a vision pipeline for different datasets to solve a computer vision task is a complex and t...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
Deep Reinforcement Learning (DRL) has gained much attention for solving robotic hand-eye coordinatio...
Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (A...
The vision transformer (ViT) has advanced to the cutting edge in the visual recognition task. Transf...
Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in...
Recent unsupervised pre-training methods have shown to be effective on language and vision domains b...
177 pagesThe field of computer vision has benefited tremendously from an unusual blessing: a baselin...