While post-training quantization receives popularity mostly due to its evasion in accessing the original complete training dataset, its poor performance also stems from scarce images. To alleviate this limitation, in this paper, we leverage the synthetic data introduced by zero-shot quantization with calibration dataset and propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization. The method is based on two important properties of batch normalization statistics (BNS) we observed in deep layers of the trained network, (i.e.), inter-class separation and intra-class incohesion. To preserve this fine-grained distribution information: 1) We calculate the per-class BNS of the calibrat...
As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many de...
Quantization is a promising approach for reducing the inference time and memory footprint of neural ...
We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network t...
Data-free quantization is a task that compresses the neural network to low bit-width without access ...
While neural networks have been remarkably successful in a wide array of applications, implementing ...
At present, the quantification methods of neural network models are mainly divided into post-trainin...
Zero-shot quantization is a promising approach for developing lightweight deep neural networks when ...
Network quantization has emerged as a promising method for model compression and inference accelerat...
Quantization of the weights and activations is one of the main methods to reduce the computational f...
Post-training quantization (PTQ) can reduce the memory footprint and latency for deep model inferenc...
We consider the post-training quantization problem, which discretizes the weights of pre-trained dee...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
Robust quantization improves the tolerance of networks for various implementations, allowing reliabl...
We explore calibration properties at various precisions for three architectures: ShuffleNetv2, Ghost...
Recent advances in deep learning methods such as LLMs and Diffusion models have created a need for i...
As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many de...
Quantization is a promising approach for reducing the inference time and memory footprint of neural ...
We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network t...
Data-free quantization is a task that compresses the neural network to low bit-width without access ...
While neural networks have been remarkably successful in a wide array of applications, implementing ...
At present, the quantification methods of neural network models are mainly divided into post-trainin...
Zero-shot quantization is a promising approach for developing lightweight deep neural networks when ...
Network quantization has emerged as a promising method for model compression and inference accelerat...
Quantization of the weights and activations is one of the main methods to reduce the computational f...
Post-training quantization (PTQ) can reduce the memory footprint and latency for deep model inferenc...
We consider the post-training quantization problem, which discretizes the weights of pre-trained dee...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
Robust quantization improves the tolerance of networks for various implementations, allowing reliabl...
We explore calibration properties at various precisions for three architectures: ShuffleNetv2, Ghost...
Recent advances in deep learning methods such as LLMs and Diffusion models have created a need for i...
As a dominant paradigm, fine-tuning a pre-trained model on the target data is widely used in many de...
Quantization is a promising approach for reducing the inference time and memory footprint of neural ...
We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network t...