Hardware accelerators for neural network inference can exploit common data properties for performance gains and reduced memory bandwidth. The properties include using narrower data-types on a coarse or fine granularity, as well as exploiting the ability to skip and compress zero values and bits. This work investigates whether these properties persist in: (1) more recent and accurate image classification networks, (2) models for other applications, such as computational imaging, (3) Long-Short-Term-Memory (LSTM) models for natural language processing, and (4) quantized models. We propose a greedy approach for fixed-point quantization, that achieves between 2 and 13 bits for most networks, with an overall average of 6.5 bits. Sparsity, althou...
The recent “Cambrian explosion” of Deep Learning (DL) algorithms in concert with the end of Moore’s ...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
Sparsity is commonly produced from model compression (i.e., pruning), which eliminates unnecessary p...
Hardware accelerators for neural network inference can exploit common data properties for performanc...
DNNs have been finding a growing number of applications including image classification, speech recog...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Altho...
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Altho...
Modern iterations of deep learning models contain millions (billions) of unique parameters-each repr...
The increase in sophistication of neural network models in recent years has exponentially expanded m...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
This thesis presents a few methods to accelerate the inference of Deep Neural Networks that are lar...
International audienceIn this paper, we address the problem of reducing the memory footprint of conv...
Machine learning has achieved great success in recent years, especially the deep learning algorithms...
The recent “Cambrian explosion” of Deep Learning (DL) algorithms in concert with the end of Moore’s ...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
Sparsity is commonly produced from model compression (i.e., pruning), which eliminates unnecessary p...
Hardware accelerators for neural network inference can exploit common data properties for performanc...
DNNs have been finding a growing number of applications including image classification, speech recog...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Altho...
Machine learning, and specifically Deep Neural Networks (DNNs) impact all parts of daily life. Altho...
Modern iterations of deep learning models contain millions (billions) of unique parameters-each repr...
The increase in sophistication of neural network models in recent years has exponentially expanded m...
Over the last ten years, the rise of deep learning has redefined the state-of-the-art in many comput...
This thesis presents a few methods to accelerate the inference of Deep Neural Networks that are lar...
International audienceIn this paper, we address the problem of reducing the memory footprint of conv...
Machine learning has achieved great success in recent years, especially the deep learning algorithms...
The recent “Cambrian explosion” of Deep Learning (DL) algorithms in concert with the end of Moore’s ...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
Sparsity is commonly produced from model compression (i.e., pruning), which eliminates unnecessary p...