Block-Recurrent Transformers

Hutchins, DeLesley
Schlag, Imanol
Wu, Yuhuai
Dyer, Ethan
Neyshabur, Behnam

Publication date

November 2022

Language

English

Abstract

We introduce the Block-Recurrent Transformer, which applies a transformer layer in a recurrent fashion along a sequence, and has linear complexity with respect to sequence length. Our recurrent cell operates on blocks of tokens rather than single tokens during training, and leverages parallel computation within a block in order to make efficient use of accelerator hardware. The cell itself is strikingly simple. It is merely a transformer layer: it uses self-attention and cross-attention to efficiently compute a recurrent function over a large set of state vectors and tokens. Our design was inspired in part by LSTM cells, and it uses LSTM-style gates, but it scales the typical LSTM cell up by several orders of magnitude. Our implementation o...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Block-Recurrent Transformers

Abstract

Extracted data

Block-Recurrent Transformers

Abstract

Extracted data

Related items

Related items