In this paper, we propose a fully differentiable quantization method for vision transformer (ViT) named as Q-ViT, in which both of the quantization scales and bit-widths are learnable parameters. Specifically, based on our observation that heads in ViT display different quantization robustness, we leverage head-wise bit-width to squeeze the size of Q-ViT while preserving performance. In addition, we propose a novel technique named switchable scale to resolve the convergence problem in the joint training of quantization scales and bit-widths. In this way, Q-ViT pushes the limits of ViT quantization to 3-bit without heavy performance drop. Moreover, we analyze the quantization robustness of every architecture component of ViT and show that th...
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating o...
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision ta...
Recent advances on Vision Transformer (ViT) and its improved variants have shown that self-attention...
Network quantization significantly reduces model inference complexity and has been widely used in re...
Quantization is one of the most effective methods to compress neural networks, which has achieved gr...
Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision app...
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratic...
Neural network quantization aims to accelerate and trim full-precision neural network models by usin...
Data-free quantization can potentially address data privacy and security concerns in model compressi...
Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerab...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Pretraining language models with next-token prediction on massive text corpora has delivered phenome...
Vision transformers have recently gained great success on various computer vision tasks; nevertheles...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating o...
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision ta...
Recent advances on Vision Transformer (ViT) and its improved variants have shown that self-attention...
Network quantization significantly reduces model inference complexity and has been widely used in re...
Quantization is one of the most effective methods to compress neural networks, which has achieved gr...
Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision app...
We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratic...
Neural network quantization aims to accelerate and trim full-precision neural network models by usin...
Data-free quantization can potentially address data privacy and security concerns in model compressi...
Vision Transformers (ViT) have made many breakthroughs in computer vision tasks. However, considerab...
Vision Transformer (ViT) architectures are becoming increasingly popular and widely employed to tack...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
Pretraining language models with next-token prediction on massive text corpora has delivered phenome...
Vision transformers have recently gained great success on various computer vision tasks; nevertheles...
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of...
Recent work has explored the potential to adapt a pre-trained vision transformer (ViT) by updating o...
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision ta...
Recent advances on Vision Transformer (ViT) and its improved variants have shown that self-attention...