We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors. The key innovation in PULP-NN is a set of kernels for quantized neural network inference, targeting byte and sub-byte data types, down to INT-1, tuned for the recent trend toward aggressive quantization in deep neural network inference. The proposed library exploits both the digital signal processing extensions available in the PULP RISC-V processors and the cluster’s parallelism, achieving up to 15.5 MACs/cycle on INT-8 and improving performance by up to 63× with respect to a sequential implementation on a single RISC-V core implementing the baseline RV32IMC ISA. Using PULP-NN, a CIFAR-10 network on an octa-core c...
On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, a...
Deep neural networks have achieved impressive results in computer vision and machine learning. Unfor...
An open challenge in making Internet-of-Things sensor nodes "smart'' and self-adaptive is to enable ...
We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cl...
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V b...
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized so...
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neura...
Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors int...
Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applic...
High energy efficiency and low memory footprint are the key requirements for the deployment of deep ...
Deep Neural Networks (DNNs) computation-hungry algorithms demand hardware platforms capable of meeti...
The growing number of low-power smart devices in the Internet of Things is coupled with the concept ...
Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applic...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, a...
Deep neural networks have achieved impressive results in computer vision and machine learning. Unfor...
An open challenge in making Internet-of-Things sensor nodes "smart'' and self-adaptive is to enable ...
We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cl...
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V b...
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized so...
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neura...
Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors int...
Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applic...
High energy efficiency and low memory footprint are the key requirements for the deployment of deep ...
Deep Neural Networks (DNNs) computation-hungry algorithms demand hardware platforms capable of meeti...
The growing number of low-power smart devices in the Internet of Things is coupled with the concept ...
Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applic...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, a...
Deep neural networks have achieved impressive results in computer vision and machine learning. Unfor...
An open challenge in making Internet-of-Things sensor nodes "smart'' and self-adaptive is to enable ...