On the Expressive Flexibility of Self-Attention Matrices

Likhosherstov, Valerii
Choromanski, Krzysztof
Weller, Adrian

Open link

Publication date

June 2023

DOI

10.1609/aaai.v37i7.26055

Publisher

Association for the Advancement of Artificial Intelligence

Abstract

Transformer networks are able to capture patterns in data coming from many domains (text, images, videos, proteins, etc.) with little or no change to architecture components. We perform a theoretical analysis of the core component responsible for signal propagation between elements, i.e. the self-attention matrix. We ask the following question: Can self-attention matrix approximate arbitrary patterns? How small is the query dimension d required for such approximation? Our first result shows that the task of deciding whether approximation of a given pattern is possible or not is NP-hard for a fixed d greater than one. In practice, self-attention matrix typically exhibits two properties: it is sparse, and it changes dynamically depending on t...

Extracted data

We use cookies to provide a better user experience.

Data Protection

On the Expressive Flexibility of Self-Attention Matrices

Abstract

Extracted data

On the Expressive Flexibility of Self-Attention Matrices

Abstract

Extracted data

Related items

Related items