In the past few years, transformers have achieved promising performances on various computer vision tasks. Unfortunately, the immense inference overhead of most existing vision transformers withholds their from being deployed on edge devices such as cell phones and smart watches. Knowledge distillation is a widely used paradigm for compressing cumbersome architectures via transferring information to a compact student. However, most of them are designed for convolutional neural networks (CNNs), which do not fully investigate the character of vision transformer (ViT). In this paper, we utilize the patch-level information and propose a fine-grained manifold distillation method. Specifically, we train a tiny student model to match a pre-trained...
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratic...
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-t...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Pure transformers have shown great potential for vision tasks recently. However, their accuracy in s...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
Large pre-trained transformers are on top of contemporary semantic segmentation benchmarks, but come...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
The transformer models have shown promising effectiveness in dealing with various vision tasks. Howe...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Transformer attracts much attention because of its ability to learn global relations and superior pe...
Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerab...
Vision Transformers are very popular nowadays due to their state-of-the-art performance in several c...
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision ta...
The recent advances in image transformers have shown impressive results and have largely closed the ...
This paper investigates two techniques for developing efficient self-supervised vision transformers ...
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratic...
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-t...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Pure transformers have shown great potential for vision tasks recently. However, their accuracy in s...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
Large pre-trained transformers are on top of contemporary semantic segmentation benchmarks, but come...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
The transformer models have shown promising effectiveness in dealing with various vision tasks. Howe...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Transformer attracts much attention because of its ability to learn global relations and superior pe...
Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerab...
Vision Transformers are very popular nowadays due to their state-of-the-art performance in several c...
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision ta...
The recent advances in image transformers have shown impressive results and have largely closed the ...
This paper investigates two techniques for developing efficient self-supervised vision transformers ...
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratic...
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-t...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...