Characterizing Sources of Ineffectual Computations in Deep Learning Networks

Nikolic, Milos

Publication date

June 2020

Publisher

University of Toronto Medical Journal

Abstract

Hardware accelerators for neural network inference can exploit common data properties for performance gains and reduced memory bandwidth. The properties include using narrower data-types on a coarse or fine granularity, as well as exploiting the ability to skip and compress zero values and bits. This work investigates whether these properties persist in: (1) more recent and accurate image classification networks, (2) models for other applications, such as computational imaging, (3) Long-Short-Term-Memory (LSTM) models for natural language processing, and (4) quantized models. We propose a greedy approach for fixed-point quantization, that achieves between 2 and 13 bits for most networks, with an overall average of 6.5 bits. Sparsity, althou...