Self-attention mechanisms model long-range context by using pairwise attention between all input tokens. In doing so, they assume a fixed attention granularity defined by the individual tokens (e.g., text characters or image pixels), which may not be optimal for modeling complex dependencies at higher levels. In this paper, we propose ContextPool to address this problem by adapting the attention granularity for each token. Inspired by the success of ConvNets that are combined with pooling to capture long-range dependencies, we learn to pool neighboring features for each token before computing attention in a given attention layer. The pooling weights and support size are adaptively determined, allowing the pooled features to encode meaningfu...
Large language models (LLMs) exhibit remarkable performance improvement through in-context learning ...
© Learning Representations, ICLR 2018 - Conference Track Proceedings.All right reserved. Recurrent n...
The capability of the self-attention mechanism to model the long-range dependencies has catapulted i...
Self-attention model has shown its flexibility in parallel computation and the effectiveness on mode...
In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual pe...
Many NLP tasks require processing long contexts beyond the length limit of pretrained models. In ord...
The paper presents a scalable approach for learning distributed representations over individual toke...
In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNN...
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative objec...
Large pre-trained vision-language models like CLIP have shown great potential in learning representa...
Abstract Object detection is an important component of computer vision. Most of the recent successfu...
International audienceWe address the problem of learning on sets of features, motivated by the need ...
Self-attention, an architectural motif designed to model long-range interactions in sequential data,...
Squeeze-and-Excitation (SE) block presents a channel attention mechanism for modeling global context...
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a ...
Large language models (LLMs) exhibit remarkable performance improvement through in-context learning ...
© Learning Representations, ICLR 2018 - Conference Track Proceedings.All right reserved. Recurrent n...
The capability of the self-attention mechanism to model the long-range dependencies has catapulted i...
Self-attention model has shown its flexibility in parallel computation and the effectiveness on mode...
In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual pe...
Many NLP tasks require processing long contexts beyond the length limit of pretrained models. In ord...
The paper presents a scalable approach for learning distributed representations over individual toke...
In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNN...
Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative objec...
Large pre-trained vision-language models like CLIP have shown great potential in learning representa...
Abstract Object detection is an important component of computer vision. Most of the recent successfu...
International audienceWe address the problem of learning on sets of features, motivated by the need ...
Self-attention, an architectural motif designed to model long-range interactions in sequential data,...
Squeeze-and-Excitation (SE) block presents a channel attention mechanism for modeling global context...
Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a ...
Large language models (LLMs) exhibit remarkable performance improvement through in-context learning ...
© Learning Representations, ICLR 2018 - Conference Track Proceedings.All right reserved. Recurrent n...
The capability of the self-attention mechanism to model the long-range dependencies has catapulted i...