The multi-head self-attention mechanism of the transformer model has been thoroughly investigated recently. In one vein of study, researchers are interested in understanding why and how transformers work. In another vein, researchers propose new attention augmentation methods to make transformers more accurate, efficient and interpretable. In this paper, we combine these two lines of research in a human-in-the-loop pipeline to first discover important task-specific attention patterns. Then those patterns are injected, not only to smaller models, but also to the original model. The benefits of our pipeline and discovered patterns are demonstrated in two case studies with extractive summarization and topic segmentation. After discovering inte...
Transformers have become an indispensable module for text generation models since their great succes...
As the key component in Transformer models, attention mechanism has shown its great power in learnin...
Recent years have seen the vast potential of the Transformer model, as it is arguably the first gene...
The transformer multi-head self-attention mechanism has been thoroughly investigated recently. On o...
Transformer models are revolutionizing machine learning, but their inner workings remain mysterious....
Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkab...
The attention mechanism is considered the backbone of the widely-used Transformer architecture. It c...
Transformers are the state-of-the-art for machine translation and grammar error correction. One of t...
International audienceAttention mechanisms have played a crucial role in the development of complex ...
Vision Transformers are very popular nowadays due to their state-of-the-art performance in several c...
This report introduces the Attention Visualizer package, which is crafted to visually illustrate the...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-at...
Large pretrained language models using the transformer neural network architecture are becoming a do...
The deep learning architecture associated with ChatGPT and related generative AI products is known a...
Transformer trackers have achieved impressive advancements recently, where the attention mechanism p...
Transformers have become an indispensable module for text generation models since their great succes...
As the key component in Transformer models, attention mechanism has shown its great power in learnin...
Recent years have seen the vast potential of the Transformer model, as it is arguably the first gene...
The transformer multi-head self-attention mechanism has been thoroughly investigated recently. On o...
Transformer models are revolutionizing machine learning, but their inner workings remain mysterious....
Multi-head attention is a driving force behind state-of-the-art transformers, which achieve remarkab...
The attention mechanism is considered the backbone of the widely-used Transformer architecture. It c...
Transformers are the state-of-the-art for machine translation and grammar error correction. One of t...
International audienceAttention mechanisms have played a crucial role in the development of complex ...
Vision Transformers are very popular nowadays due to their state-of-the-art performance in several c...
This report introduces the Attention Visualizer package, which is crafted to visually illustrate the...
We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-at...
Large pretrained language models using the transformer neural network architecture are becoming a do...
The deep learning architecture associated with ChatGPT and related generative AI products is known a...
Transformer trackers have achieved impressive advancements recently, where the attention mechanism p...
Transformers have become an indispensable module for text generation models since their great succes...
As the key component in Transformer models, attention mechanism has shown its great power in learnin...
Recent years have seen the vast potential of the Transformer model, as it is arguably the first gene...