Recently low-precision deep learning accelerators (DLAs) have become popular due to their advantages in chip area and energy consumption, yet the low-precision quantized models on these DLAs bring in severe accuracy degradation. One way to achieve both high accuracy and efficient inference is to deploy high-precision neural networks on low-precision DLAs, which is rarely studied. In this paper, we propose the PArallel Low-precision Quantization (PalQuant) method that approximates high-precision computations via learning parallel low-precision representations from scratch. In addition, we present a novel cyclic shuffle module to boost the cross-group information communication between parallel low-precision groups. Extensive experiments demon...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
Graph Neural Network (GNN) training and inference involve significant challenges of scalability with...
Deep neural networks (DNN) have achieved impressive success in multiple domains. Over the years, the...
Efficient machine learning implementations optimized for inference in hardware have wide-ranging ben...
Owing to the presence of large values, which we call outliers, conventional methods of quantization ...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
Deep learning is finding its way into high energy physics by replacing traditional Monte Carlo simul...
Hardware accelerators for Deep Neural Networks (DNNs) that use reduced precision parameters are more...
Although the quest for more accurate solutions is pushing deep learning research towards larger and ...
With the surging popularity of edge computing, the need to efficiently perform neural network infere...
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neura...
The severe on-chip memory limitations are currently preventing the deployment of the most accurate D...
We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network t...
The increase in sophistication of neural network models in recent years has exponentially expanded m...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
Graph Neural Network (GNN) training and inference involve significant challenges of scalability with...
Deep neural networks (DNN) have achieved impressive success in multiple domains. Over the years, the...
Efficient machine learning implementations optimized for inference in hardware have wide-ranging ben...
Owing to the presence of large values, which we call outliers, conventional methods of quantization ...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
Deep learning is finding its way into high energy physics by replacing traditional Monte Carlo simul...
Hardware accelerators for Deep Neural Networks (DNNs) that use reduced precision parameters are more...
Although the quest for more accurate solutions is pushing deep learning research towards larger and ...
With the surging popularity of edge computing, the need to efficiently perform neural network infere...
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neura...
The severe on-chip memory limitations are currently preventing the deployment of the most accurate D...
We introduce a Power-of-Two low-bit post-training quantization(PTQ) method for deep neural network t...
The increase in sophistication of neural network models in recent years has exponentially expanded m...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
Graph Neural Network (GNN) training and inference involve significant challenges of scalability with...