Transformer networks are able to capture patterns in data coming from many domains (text, images, videos, proteins, etc.) with little or no change to architecture components. We perform a theoretical analysis of the core component responsible for signal propagation between elements, i.e. the self-attention matrix. We ask the following question: Can self-attention matrix approximate arbitrary patterns? How small is the query dimension d required for such approximation? Our first result shows that the task of deciding whether approximation of a given pattern is possible or not is NP-hard for a fixed d greater than one. In practice, self-attention matrix typically exhibits two properties: it is sparse, and it changes dynamically depending on t...
Transformers have made progress in miscellaneous tasks, but suffer from quadratic computational and ...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-at...
Self-attention networks have shown remarkable progress in computer vision tasks such as image classi...
Transformer networks are able to capture patterns in data coming from many domains (text, images, vi...
Self-attention, an architectural motif designed to model long-range interactions in sequential data,...
To overcome the quadratic cost of self-attention, recent works have proposed various sparse attentio...
In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approa...
Attention-based architectures have become ubiquitous in machine learning, yet our understanding of t...
Recent years have seen the vast potential of the Transformer model, as it is arguably the first gene...
Transformers have emerged as a powerful tool for a broad range of natural language processing tasks....
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by ...
Considering the spectral properties of images, we propose a new self-attention mechanism with highly...
In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual pe...
To improve the robustness of transformer neural networks used for temporal-dynamics prediction of ch...
Transformers have recently shown superior performances on various vision tasks. The large, sometimes...
Transformers have made progress in miscellaneous tasks, but suffer from quadratic computational and ...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-at...
Self-attention networks have shown remarkable progress in computer vision tasks such as image classi...
Transformer networks are able to capture patterns in data coming from many domains (text, images, vi...
Self-attention, an architectural motif designed to model long-range interactions in sequential data,...
To overcome the quadratic cost of self-attention, recent works have proposed various sparse attentio...
In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approa...
Attention-based architectures have become ubiquitous in machine learning, yet our understanding of t...
Recent years have seen the vast potential of the Transformer model, as it is arguably the first gene...
Transformers have emerged as a powerful tool for a broad range of natural language processing tasks....
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by ...
Considering the spectral properties of images, we propose a new self-attention mechanism with highly...
In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual pe...
To improve the robustness of transformer neural networks used for temporal-dynamics prediction of ch...
Transformers have recently shown superior performances on various vision tasks. The large, sometimes...
Transformers have made progress in miscellaneous tasks, but suffer from quadratic computational and ...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-at...
Self-attention networks have shown remarkable progress in computer vision tasks such as image classi...