Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summ...
Even though many efficient transformers have been proposed, only few such models are available for s...
We revisit the design choices in Transformers, and propose methods to address their weaknesses in ha...
Transformer-based models have achieved state-of-the-art results in a wide range of natural language ...
Pretrained transformer models have demonstrated remarkable performance across various natural langua...
Transformer models achieve state-of-the-art performance on a wide range of NLP tasks. They however s...
Transformer-based pretrained language models (LMs) are ubiquitous across natural language understand...
An ideal length-extrapolatable Transformer language model can handle sequences longer than the train...
International audienceTransformer deep models have gained lots of attraction in Neural Text Summariz...
Transformers have achieved success in both language and vision domains. However, it is prohibitively...
T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every N...
Since their release, Transformers have revolutionized many fields from Natural Language Understandin...
Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse ph...
Transformer models cannot easily scale to long sequences due to their O(N^2) time and space complexi...
State space models (SSMs) have shown impressive results on tasks that require modeling long-range de...
While Transformer language models (LMs) are state-of-the-art for information extraction, long text i...
Even though many efficient transformers have been proposed, only few such models are available for s...
We revisit the design choices in Transformers, and propose methods to address their weaknesses in ha...
Transformer-based models have achieved state-of-the-art results in a wide range of natural language ...
Pretrained transformer models have demonstrated remarkable performance across various natural langua...
Transformer models achieve state-of-the-art performance on a wide range of NLP tasks. They however s...
Transformer-based pretrained language models (LMs) are ubiquitous across natural language understand...
An ideal length-extrapolatable Transformer language model can handle sequences longer than the train...
International audienceTransformer deep models have gained lots of attraction in Neural Text Summariz...
Transformers have achieved success in both language and vision domains. However, it is prohibitively...
T5 Model (@patrickvonplaten, @thomwolf ) T5 is a powerful encoder-decoder model that formats every N...
Since their release, Transformers have revolutionized many fields from Natural Language Understandin...
Document-level Neural Machine Translation (DocNMT) has been proven crucial for handling discourse ph...
Transformer models cannot easily scale to long sequences due to their O(N^2) time and space complexi...
State space models (SSMs) have shown impressive results on tasks that require modeling long-range de...
While Transformer language models (LMs) are state-of-the-art for information extraction, long text i...
Even though many efficient transformers have been proposed, only few such models are available for s...
We revisit the design choices in Transformers, and propose methods to address their weaknesses in ha...
Transformer-based models have achieved state-of-the-art results in a wide range of natural language ...