Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Abnar, Samira
Saremi, Omid
Dinh, Laurent
Wilson, Shantel
Bautista, Miguel Angel
Huang, Chen
Thilak, Vimal
Littwin, Etai
Gu, Jiatao
Susskind, Josh
Bengio, Samy

Publication date

October 2023

Language

English

Abstract

Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et al. (2021). We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i.e., the depth of the computation graph). Based on our observations, we propose a transformer-based architecture called Hyper-UT, whi...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Abstract

Extracted data

Adaptivity and Modularity for Efficient Generalization Over Task Complexity

Abstract

Extracted data

Related items

Related items