Vision transformers have recently gained great success on various computer vision tasks; nevertheless, their high model complexity makes it challenging to deploy on resource-constrained devices. Quantization is an effective approach to reduce model complexity, and data-free quantization, which can address data privacy and security concerns during model deployment, has received widespread interest. Unfortunately, all existing methods, such as BN regularization, were designed for convolutional neural networks and cannot be applied to vision transformers with significantly different model architectures. In this paper, we propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers, to enable the generatio...
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision ta...
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose ...
Although transformer networks are recently employed in various vision tasks with outperforming perfo...
Data-free quantization can potentially address data privacy and security concerns in model compressi...
Quantization is one of the most effective methods to compress neural networks, which has achieved gr...
Network quantization significantly reduces model inference complexity and has been widely used in re...
In this paper, we propose a fully differentiable quantization method for vision transformer (ViT) na...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
We introduce token-consistent stochastic layers in vision transformers, without causing any severe d...
Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision app...
Recent self-supervised learning (SSL) methods have shown impressive results in learning visual repre...
The vision transformer (ViT) has advanced to the cutting edge in the visual recognition task. Transf...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Transformer design is the de facto standard for natural language processing tasks. The success of th...
Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range ...
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision ta...
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose ...
Although transformer networks are recently employed in various vision tasks with outperforming perfo...
Data-free quantization can potentially address data privacy and security concerns in model compressi...
Quantization is one of the most effective methods to compress neural networks, which has achieved gr...
Network quantization significantly reduces model inference complexity and has been widely used in re...
In this paper, we propose a fully differentiable quantization method for vision transformer (ViT) na...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
We introduce token-consistent stochastic layers in vision transformers, without causing any severe d...
Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision app...
Recent self-supervised learning (SSL) methods have shown impressive results in learning visual repre...
The vision transformer (ViT) has advanced to the cutting edge in the visual recognition task. Transf...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Transformer design is the de facto standard for natural language processing tasks. The success of th...
Vision Transformers (ViT) have been shown to attain highly competitive performance for a wide range ...
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision ta...
We explore the capability of plain Vision Transformers (ViTs) for semantic segmentation and propose ...
Although transformer networks are recently employed in various vision tasks with outperforming perfo...