Model quantization helps to reduce model size and latency of deep neural networks. Mixed precision quantization is favorable with customized hardwares supporting arithmetic operations at multiple bit-widths to achieve maximum efficiency. We propose a novel learning-based algorithm to derive mixed precision models end-to-end under target computation constraints and model sizes. During the optimization, the bit-width of each layer / kernel in the model is at a fractional status of two consecutive bit-widths which can be adjusted gradually. With a differentiable regularization term, the resource constraints can be met during the quantization-aware training which results in an optimized mixed precision model. Our final models achieve comparable...
At present, the quantification methods of neural network models are mainly divided into post-trainin...
In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, tog...
Quantizing weights and activations of deep neural networks is essential for deploying them in resour...
Quantized neural networks are well known for reducing latency, power consumption, and model size wit...
Quantization of deep neural networks is a common way to optimize the networks for deployment on ener...
To bridge the ever-increasing gap between deep neural networks' complexity and hardware capability, ...
We consider the post-training quantization problem, which discretizes the weights of pre-trained dee...
The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to...
Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer qua...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
Quantization of neural networks has been one of the most popular techniques to compress models for e...
none4noThe severe on-chip memory limitations are currently preventing the deployment of the most acc...
Low bit-width model quantization is highly desirable when deploying a deep neural network on mobile ...
We present any-precision deep neural networks (DNNs), which are trained with a new method that allow...
At present, the quantification methods of neural network models are mainly divided into post-trainin...
In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, tog...
Quantizing weights and activations of deep neural networks is essential for deploying them in resour...
Quantized neural networks are well known for reducing latency, power consumption, and model size wit...
Quantization of deep neural networks is a common way to optimize the networks for deployment on ener...
To bridge the ever-increasing gap between deep neural networks' complexity and hardware capability, ...
We consider the post-training quantization problem, which discretizes the weights of pre-trained dee...
The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to...
Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer qua...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
Quantization of neural networks has been one of the most popular techniques to compress models for e...
none4noThe severe on-chip memory limitations are currently preventing the deployment of the most acc...
Low bit-width model quantization is highly desirable when deploying a deep neural network on mobile ...
We present any-precision deep neural networks (DNNs), which are trained with a new method that allow...
At present, the quantification methods of neural network models are mainly divided into post-trainin...
In recent years Deep Neural Networks (DNNs) have been rapidly developed in various applications, tog...
Quantizing weights and activations of deep neural networks is essential for deploying them in resour...