The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme. To tackle this issue, in this paper we present an automated mixed-precision quantization flow based on the HAQ framework but tailored for the memory and computational characteristics of MCU devices. Specifically, a Reinforcement Learning agent searches for the best uniform quantization levels, among 2, 4, 8 bits, of individual weight and activation tensors, under the tight constraints on RAM and FLASH embedded memory sizes. We conduct an experimental analysis on MobileNetV1, MobileNetV2 and MNasNet models for Imagen...
To bridge the ever-increasing gap between deep neural networks' complexity and hardware capability, ...
The compression of deep learning models is of fundamental importance in deploying such models to edg...
Quantization is a promising approach for reducing the inference time and memory footprint of neural ...
The severe on-chip memory limitations are currently preventing the deployment of the most accurate D...
Quantization of deep neural networks is a common way to optimize the networks for deployment on ener...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...
Abstract Model quantization is a widely used technique to compress and accelerate deep neural netwo...
Quantization of neural networks has been one of the most popular techniques to compress models for e...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
Low-precision integer arithmetic is a necessary ingredient for enabling Deep Learning inference on t...
With increased network downsizing and cost minimization in deployment of neural network (NN) models,...
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Altho...
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Altho...
To bridge the ever-increasing gap between deep neural networks' complexity and hardware capability, ...
The compression of deep learning models is of fundamental importance in deploying such models to edg...
Quantization is a promising approach for reducing the inference time and memory footprint of neural ...
The severe on-chip memory limitations are currently preventing the deployment of the most accurate D...
Quantization of deep neural networks is a common way to optimize the networks for deployment on ener...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...
Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligenc...
Abstract Model quantization is a widely used technique to compress and accelerate deep neural netwo...
Quantization of neural networks has been one of the most popular techniques to compress models for e...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
Low-precision integer arithmetic is a necessary ingredient for enabling Deep Learning inference on t...
With increased network downsizing and cost minimization in deployment of neural network (NN) models,...
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Altho...
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Altho...
To bridge the ever-increasing gap between deep neural networks' complexity and hardware capability, ...
The compression of deep learning models is of fundamental importance in deploying such models to edg...
Quantization is a promising approach for reducing the inference time and memory footprint of neural ...