Can transformers generalize efficiently on problems that require dealing with examples with different levels of difficulty? We introduce a new task tailored to assess generalization over different complexities and present results that indicate that standard transformers face challenges in solving these tasks. These tasks are variations of pointer value retrieval previously introduced by Zhang et al. (2021). We investigate how the use of a mechanism for adaptive and modular computation in transformers facilitates the learning of tasks that demand generalization over the number of sequential computation steps (i.e., the depth of the computation graph). Based on our observations, we propose a transformer-based architecture called Hyper-UT, whi...
Mathematical reasoning is one of the most impressive achievements of human intellect but remains a f...
Since its introduction, the transformer model has demonstrated outstanding performance across variou...
Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neu...
Transformer networks have seen great success in natural language processing and machine vision, wher...
Algorithmic generalization in machine learning refers to the ability to learn the underlying algorit...
Artificial neural networks have become highly effective at performing specific, challenging tasks by...
We introduce the first multitasking vision transformer adapters that learn generalizable task affini...
A key feature of intelligent behaviour is the ability to learn abstract strategies that scale and tr...
In this work, we study rapid, step-wise improvements of the loss in transformers when being confront...
The Transformer architecture has revolutionized deep learning on sequential data, becoming ubiquitou...
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This chal...
In this paper, we propose that the dot product pairwise matching attention layer, which is widely us...
Despite progress across a broad range of applications, Transformers have limited success in systemat...
We propose a synthetic task, LEGO (Learning Equality and Group Operations), that encapsulates the pr...
Systematic generalization is the ability to combine known parts into novel meaning; an important asp...
Mathematical reasoning is one of the most impressive achievements of human intellect but remains a f...
Since its introduction, the transformer model has demonstrated outstanding performance across variou...
Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neu...
Transformer networks have seen great success in natural language processing and machine vision, wher...
Algorithmic generalization in machine learning refers to the ability to learn the underlying algorit...
Artificial neural networks have become highly effective at performing specific, challenging tasks by...
We introduce the first multitasking vision transformer adapters that learn generalizable task affini...
A key feature of intelligent behaviour is the ability to learn abstract strategies that scale and tr...
In this work, we study rapid, step-wise improvements of the loss in transformers when being confront...
The Transformer architecture has revolutionized deep learning on sequential data, becoming ubiquitou...
Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This chal...
In this paper, we propose that the dot product pairwise matching attention layer, which is widely us...
Despite progress across a broad range of applications, Transformers have limited success in systemat...
We propose a synthetic task, LEGO (Learning Equality and Group Operations), that encapsulates the pr...
Systematic generalization is the ability to combine known parts into novel meaning; an important asp...
Mathematical reasoning is one of the most impressive achievements of human intellect but remains a f...
Since its introduction, the transformer model has demonstrated outstanding performance across variou...
Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neu...