Although neural network quantization is an imperative technology for the computation and memory efficiency of embedded neural network accelerators, simple post-training quantization incurs unacceptable levels of accuracy degradation on some important models targeting embedded systems, such as MobileNets. While explicit quantization-aware training or re-training after quantization can often reclaim lost accuracy, this is not always possible or convenient. We present an alternative approach to compressing such difficult neural networks, using a novel variant of the ZFP lossy floating-point compression algorithm to compress both model weights and inter-layer activations and demonstrate that it can be efficiently implemented on an embedded FPGA...
Autonomous cars are complex applications that need powerful hardware machines to be able to function...
Real-time inference of deep convolutional neural networks (CNNs) on embedded systems and SoCs would ...
Real-time inference of deep convolutional neural networks (CNNs) on embedded systems and SoCs would ...
The increase in sophistication of neural network models in recent years has exponentially expanded m...
In the wake of the success of convolutional neural networks in image classification, object recognit...
Over the last decade, various deep neural network models have achieved great success in image recogn...
Deep neural networks (DNNs) are a key technology nowadays and the main driving factor for many recen...
Hardware accelerators such as GPUs and FPGAs can often provide enormous computing capabilities and p...
We investigate the compression of deep neural networks by quantizing their weights and activations i...
Parallel hardware accelerators, for example Graphics Processor Units, have limited on-chip memory ca...
130 pagesOver the past decade, machine learning (ML) with deep neural networks (DNNs) has become ext...
The training of deep neural networks (DNNs) requires intensive resources both for computation and fo...
Quantization of deep neural networks is a common way to optimize the networks for deployment on ener...
none4noThe severe on-chip memory limitations are currently preventing the deployment of the most acc...
Convolutional Neural Networks (CNNs) were created for image classification tasks. Quickly, they were...
Autonomous cars are complex applications that need powerful hardware machines to be able to function...
Real-time inference of deep convolutional neural networks (CNNs) on embedded systems and SoCs would ...
Real-time inference of deep convolutional neural networks (CNNs) on embedded systems and SoCs would ...
The increase in sophistication of neural network models in recent years has exponentially expanded m...
In the wake of the success of convolutional neural networks in image classification, object recognit...
Over the last decade, various deep neural network models have achieved great success in image recogn...
Deep neural networks (DNNs) are a key technology nowadays and the main driving factor for many recen...
Hardware accelerators such as GPUs and FPGAs can often provide enormous computing capabilities and p...
We investigate the compression of deep neural networks by quantizing their weights and activations i...
Parallel hardware accelerators, for example Graphics Processor Units, have limited on-chip memory ca...
130 pagesOver the past decade, machine learning (ML) with deep neural networks (DNNs) has become ext...
The training of deep neural networks (DNNs) requires intensive resources both for computation and fo...
Quantization of deep neural networks is a common way to optimize the networks for deployment on ener...
none4noThe severe on-chip memory limitations are currently preventing the deployment of the most acc...
Convolutional Neural Networks (CNNs) were created for image classification tasks. Quickly, they were...
Autonomous cars are complex applications that need powerful hardware machines to be able to function...
Real-time inference of deep convolutional neural networks (CNNs) on embedded systems and SoCs would ...
Real-time inference of deep convolutional neural networks (CNNs) on embedded systems and SoCs would ...