FNet: Mixing Tokens with Fourier Transforms

Lee-Thorp, James
Ainslie, Joshua
Eckstein, Ilya
Ontanon, Santiago

Publication date

May 2022

Language

English

Abstract

We show that Transformer encoder architectures can be sped up, with limited accuracy costs, by replacing the self-attention sublayers with simple linear transformations that "mix" input tokens. These linear mixers, along with standard nonlinearities in feed-forward layers, prove competent at modeling semantic relationships in several text classification tasks. Most surprisingly, we find that replacing the self-attention sublayer in a Transformer encoder with a standard, unparameterized Fourier Transform achieves 92-97% of the accuracy of BERT counterparts on the GLUE benchmark, but trains 80% faster on GPUs and 70% faster on TPUs at standard 512 input lengths. At longer input lengths, our FNet model is significantly faster: when compared to...

Extracted data

We use cookies to provide a better user experience.

Data Protection

FNet: Mixing Tokens with Fourier Transforms

Abstract

Extracted data

FNet: Mixing Tokens with Fourier Transforms

Abstract

Extracted data

Related items

Related items