Recent years have witnessed intensive research interests on training deep neural networks (DNNs) more efficiently by quantization-based compression methods, which facilitate DNNs training in two ways: (1) activations are quantized to shrink the memory consumption, and (2) gradients are quantized to decrease the communication cost. However, existing methods mostly use a uniform mechanism that quantizes the values evenly. Such a scheme may cause a large quantization variance and slow down the convergence in practice. In this work, we introduce TinyScript, which applies a non-uniform quantization algorithm to both activations and gradients. TinyScript models the original values by a family of Weibull distributions and searches for ”quantizatio...
Quantized neural networks typically require smaller memory footprints and lower computation complexi...
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classifi...
We investigate the compression of deep neural networks by quantizing their weights and activations i...
The increase in sophistication of neural network models in recent years has exponentially expanded m...
We study the dynamics of gradient descent in learning neural networks for classification problems. U...
Quantization has shown stunning efficiency on deep neural network, especially for portable devices w...
This work considers a challenging Deep Neural Network(DNN) quantization task that seeks to train qua...
When training neural networks with simulated quantization, we observe that quantized weights can, ra...
While deep neural networks are a highly successful model class, their large memory footprint puts co...
While deep neural networks are a highly successful model class, their large memory footprint puts co...
Quantization of deep neural networks is extremely essential for efficient implementations. Low-preci...
The advancement of deep models poses great challenges to real-world deployment because of the limite...
Network quantization is an effective solution to compress deep neural networks for practical usage. ...
With numerous breakthroughs over the past several years, deep learning (DL) techniques have transfor...
Neural network quantization has become an important research area due to its great impact on deploym...
Quantized neural networks typically require smaller memory footprints and lower computation complexi...
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classifi...
We investigate the compression of deep neural networks by quantizing their weights and activations i...
The increase in sophistication of neural network models in recent years has exponentially expanded m...
We study the dynamics of gradient descent in learning neural networks for classification problems. U...
Quantization has shown stunning efficiency on deep neural network, especially for portable devices w...
This work considers a challenging Deep Neural Network(DNN) quantization task that seeks to train qua...
When training neural networks with simulated quantization, we observe that quantized weights can, ra...
While deep neural networks are a highly successful model class, their large memory footprint puts co...
While deep neural networks are a highly successful model class, their large memory footprint puts co...
Quantization of deep neural networks is extremely essential for efficient implementations. Low-preci...
The advancement of deep models poses great challenges to real-world deployment because of the limite...
Network quantization is an effective solution to compress deep neural networks for practical usage. ...
With numerous breakthroughs over the past several years, deep learning (DL) techniques have transfor...
Neural network quantization has become an important research area due to its great impact on deploym...
Quantized neural networks typically require smaller memory footprints and lower computation complexi...
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classifi...
We investigate the compression of deep neural networks by quantizing their weights and activations i...