Low bit-width Quantized Neural Networks (QNNs) enable deployment of complex machine learning models on constrained devices such as microcontrollers (MCUs) by reducing their memory footprint. Fine-grained asymmetric quantization (i.e., different bit-widths assigned to weights and activations on a tensor-by-tensor basis) is a particularly interesting scheme to maximize accuracy under a tight memory constraint. However, the lack of sub-byte instruction set architecture (ISA) support in SoA microprocessors makes it hard to fully exploit this extreme quantization paradigm in embedded MCUs. Support for sub-byte and asymmetric QNNs would require many precision formats and an exorbitant amount of opcode space. In this work, we attack this problem w...
International audienceA lot of recent progress has been made in ultra lowbit quantization, promising...
Nowadays, two groundbreaking factors are emerging in neural networks. Firstly, there is the RISC-V o...
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V b...
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neura...
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized so...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
none4noThe severe on-chip memory limitations are currently preventing the deployment of the most acc...
Deep Neural Network (DNN) inference based on quantized narrow-precision integer data represents a pr...
High energy efficiency and low memory footprint are the key requirements for the deployment of deep ...
Abstract Model quantization is a widely used technique to compress and accelerate deep neural netwo...
Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors int...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
With the surging popularity of edge computing, the need to efficiently perform neural network infere...
International audienceA lot of recent progress has been made in ultra lowbit quantization, promising...
Nowadays, two groundbreaking factors are emerging in neural networks. Firstly, there is the RISC-V o...
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V b...
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neura...
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized so...
Recently, there has been a push to perform deep learning (DL) computations on the edge rather than t...
none4noThe severe on-chip memory limitations are currently preventing the deployment of the most acc...
Deep Neural Network (DNN) inference based on quantized narrow-precision integer data represents a pr...
High energy efficiency and low memory footprint are the key requirements for the deployment of deep ...
Abstract Model quantization is a widely used technique to compress and accelerate deep neural netwo...
Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors int...
The large computing and memory cost of deep neural networks (DNNs) often precludes their use in reso...
With the surging popularity of edge computing, the need to efficiently perform neural network infere...
International audienceA lot of recent progress has been made in ultra lowbit quantization, promising...
Nowadays, two groundbreaking factors are emerging in neural networks. Firstly, there is the RISC-V o...
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V b...