Accelerating Attention Models on Hardware

Li, Zheng

Publication date

January 2022

Publisher

eScholarship, University of California

Abstract

The attention mechanism is the key to many state-of-the-art transformer-based models in Natural Language Processing and Computer Vision. These models are pretrained on large datasets and the model size is growing rapidly. At the same time, the computation and data movement cost and the on-chip memory demand is also growing beyond the capabilities of edge devices. This thesis provides solutions to address these challenges by developing strategies to prune the inconsequential attention scores efficiently and effectively. Attention score is the core of the atten- tion mechanism in all transformer-based models. It measures the correlation of two tokens in a sequence. Low attention score value indicates unimportant correlation and minimal impact...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Accelerating Attention Models on Hardware

Abstract

Extracted data

Accelerating Attention Models on Hardware

Abstract

Extracted data

Related items

Related items