International audienceThe most compute-intensive stage of deep neural network (DNN) training is matrix multiplication where the multiply-accumulate (MAC) operator is key. To reduce training costs, we consider using low-precision arithmetic for MAC operations. While low-precision training has been investigated in prior work, the focus has been on reducing the number of bits in weights or activations without compromising accuracy. In contrast, the focus in this paper is on implementation details beyond weight or activation width that affect area and accuracy. In particular, we investigate the impact of fixed-versus floating-point representations, multiplier rounding, and floatingpoint exceptional value support. Results suggest that (1) lowpre...
Due to their potential to reduce silicon area or boost throughput, low-precision computations were w...
DNNs have been finding a growing number of applications including image classification, speech recog...
International audienceResource requirements for hardware acceleration of neural networks inference i...
International audienceThe most compute-intensive stage of deep neural network (DNN) training is matr...
Due to limited size, cost and power, embedded devices do not offer the same computational throughput...
Hardware accelerators for Deep Neural Networks (DNNs) that use reduced precision parameters are more...
Mixed-precision (MP) arithmetic combining both single- and half-precision operands has been successf...
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, ...
Low-precision formats have recently driven major breakthroughs in neural network (NN) training and i...
The unprecedented growth in DNN model complexity, size and the amount of training data have led to a...
The current trend for deep learning has come with an enormous computational need for billions of Mul...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
International audienceGraphics Processing Units (GPUs) offer the possibility to execute floating-poi...
Low-precision formats have recently driven major breakthroughs in neural network (NN) training and i...
When training early-stage deep neural networks (DNNs), generating intermediate features via convolut...
Due to their potential to reduce silicon area or boost throughput, low-precision computations were w...
DNNs have been finding a growing number of applications including image classification, speech recog...
International audienceResource requirements for hardware acceleration of neural networks inference i...
International audienceThe most compute-intensive stage of deep neural network (DNN) training is matr...
Due to limited size, cost and power, embedded devices do not offer the same computational throughput...
Hardware accelerators for Deep Neural Networks (DNNs) that use reduced precision parameters are more...
Mixed-precision (MP) arithmetic combining both single- and half-precision operands has been successf...
Training large-scale deep neural networks (DNNs) currently requires a significant amount of energy, ...
Low-precision formats have recently driven major breakthroughs in neural network (NN) training and i...
The unprecedented growth in DNN model complexity, size and the amount of training data have led to a...
The current trend for deep learning has come with an enormous computational need for billions of Mul...
Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from...
International audienceGraphics Processing Units (GPUs) offer the possibility to execute floating-poi...
Low-precision formats have recently driven major breakthroughs in neural network (NN) training and i...
When training early-stage deep neural networks (DNNs), generating intermediate features via convolut...
Due to their potential to reduce silicon area or boost throughput, low-precision computations were w...
DNNs have been finding a growing number of applications including image classification, speech recog...
International audienceResource requirements for hardware acceleration of neural networks inference i...