LongT5: Efficient Text-To-Text Transformer for Long Sequences

Guo, Mandy
Ainslie, Joshua
Uthus, David
Ontanon, Santiago
Ni, Jianmo
Sung, Yun-Hsuan
Yang, Yinfei

Publication date

May 2022

Language

English

Abstract

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models. In this paper, we present a new model, called LongT5, with which we explore the effects of scaling both the input length and model size at the same time. Specifically, we integrated attention ideas from long-input transformers (ETC), and adopted pre-training strategies from summarization pre-training (PEGASUS) into the scalable T5 architecture. The result is a new attention mechanism we call {\em Transient Global} (TGlobal), which mimics ETC's local/global attention mechanism, but without requiring additional side-inputs. We are able to achieve state-of-the-art results on several summ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Abstract

Extracted data

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Abstract

Extracted data

Topics

Related items

Topics

Related items