HyperMixer: An MLP-based Low Cost Alternative to Transformers

Mai, Florian
Pannatier, Arnaud
Fehr, Fabio
Chen, Haolin
Marelli, Francois
Fleuret, Francois
Henderson, James

Publication date

November 2023

Language

English

Abstract

Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternativ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Abstract

Extracted data

HyperMixer: An MLP-based Low Cost Alternative to Transformers

Abstract

Extracted data

Related items

Related items