Using mixed low-precision formats in multiply-accumulate (MAC) units for DNN training

Tatsumi, Mariko

Publication date

May 2022

Publisher

University of British Columbia Press

Abstract

Due to limited size, cost and power, embedded devices do not offer the same computational throughput as graphics processing units (GPUs) for training Deep Neural Networks (DNNs). The most compute-intensive stage of multilayer perceptron (MLP) and convolutional neural network (CNN) training is the general matrix multiply (GEMM) kernel which is executed three times per layer in each iteration: once for forward-propagation and twice for back-propagation. To reduce the number of operations, techniques such as distillation (to reduce model size) and pruning (to introduce sparsity) are commonly applied. This thesis considers another technique, where the computational effort of each operation is reduced using low-precision arithmetic. While the u...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Using mixed low-precision formats in multiply-accumulate (MAC) units for DNN training

Abstract

Extracted data

Using mixed low-precision formats in multiply-accumulate (MAC) units for DNN training

Abstract

Extracted data

Related items

Related items