The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagen...
Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine learning models ...
Quantization of neural networks has been one of the most popular techniques to compress models for e...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...
The severe on-chip memory limitations are currently preventing the deployment of the most accurate D...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
Quantization of deep neural networks is a common way to optimize the networks for deployment on ener...
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized so...
Quantization emerges as one of the most promising approaches for deploying advanced deep models on r...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
Abstract Model quantization is a widely used technique to compress and accelerate deep neural netwo...
Deep Neural Network (DNN) inference based on quantized narrow-precision integer data represents a pr...
Low-precision integer arithmetic is a necessary ingredient for enabling Deep Learning inference on t...
To bridge the ever increasing gap between deep neural networks' complexity and hardware capability, ...
Quantized neural networks are well known for reducing latency, power consumption, and model size wit...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...
Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine learning models ...
Quantization of neural networks has been one of the most popular techniques to compress models for e...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...
The severe on-chip memory limitations are currently preventing the deployment of the most accurate D...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
Quantization of deep neural networks is a common way to optimize the networks for deployment on ener...
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized so...
Quantization emerges as one of the most promising approaches for deploying advanced deep models on r...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
Abstract Model quantization is a widely used technique to compress and accelerate deep neural netwo...
Deep Neural Network (DNN) inference based on quantized narrow-precision integer data represents a pr...
Low-precision integer arithmetic is a necessary ingredient for enabling Deep Learning inference on t...
To bridge the ever increasing gap between deep neural networks' complexity and hardware capability, ...
Quantized neural networks are well known for reducing latency, power consumption, and model size wit...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...
Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine learning models ...
Quantization of neural networks has been one of the most popular techniques to compress models for e...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...