We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V based processors. The library consists of a set of kernels for Quantized Neural Network (QNN) inference on edge devices, targeting byte and sub-byte data types, down to INT-1. Our software solution exploits the digital signal processing (DSP) extensions available in the PULP RISC-V processors and the cluster's parallelism, improving performance by up to 63× with respect to a baseline implementation on a single RISC-V core implementing the RV32IMC ISA. Using the PULP-NN routines, the inference of a CIFAR-10 QNN model runs in 30× and 19.6× less clock cycles than the current state-of-the-art ARM CMSIS-NN library, running on an STM32L4 and an STM3...
In recent years, the need for the efficient deployment of Neural Networks (NN) on edge devices has b...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series...
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V b...
We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cl...
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized so...
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neura...
Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors int...
Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applic...
High energy efficiency and low memory footprint are the key requirements for the deployment of deep ...
Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applic...
The growing number of low-power smart devices in the Internet of Things is coupled with the concept ...
On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, a...
An open challenge in making Internet-of-Things sensor nodes "smart'' and self-adaptive is to enable ...
In recent years, the need for the efficient deployment of Neural Networks (NN) on edge devices has b...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series...
We present PULP-NN, a multicore computing library for a parallel ultra-low-power cluster of RISC-V b...
We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cl...
The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized so...
Heavily quantized fixed-point arithmetic is becoming a common approach to deploy Convolutional Neura...
Embedding intelligence in extreme edge devices allows distilling raw data acquired from sensors int...
Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applic...
High energy efficiency and low memory footprint are the key requirements for the deployment of deep ...
Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applic...
The growing number of low-power smart devices in the Internet of Things is coupled with the concept ...
On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, a...
An open challenge in making Internet-of-Things sensor nodes "smart'' and self-adaptive is to enable ...
In recent years, the need for the efficient deployment of Neural Networks (NN) on edge devices has b...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series...