Out-of-distribution generalization (OODG) is a longstanding challenge for neural networks. This challenge is quite apparent in tasks with well-defined variables and rules, where explicit use of the rules could solve problems independently of the particular values of the variables, but networks tend to be tied to the range of values sampled in their training data. Large transformer-based language models have pushed the boundaries on how well neural networks can solve previously unseen problems, but their complexity and lack of clarity about the relevant content in their training data obfuscates how they achieve such robustness. As a step toward understanding how transformer-based systems generalize, we explore the question of OODG in small s...
By making assumptions on the probability distribution of the potentials in a feed-forward neural net...
The performance decay experienced by deep neural networks (DNNs) when confronted with distributional...
Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e....
In this paper, we study the OOD generalization of neural algorithmic reasoning tasks, where the goal...
Algorithmic generalization in machine learning refers to the ability to learn the underlying algorit...
Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine lea...
This paper develops a novel methodology to simultaneously learn a neural network and extract general...
Most deep learning models fail to generalize in production. Indeed, sometimes data used during train...
Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neu...
We present a unified framework for a number of different ways of failing to generalize properly. Dur...
Generalization is a central aspect of learning theory. Here, we propose a framework that explores an...
Can transformers generalize efficiently on problems that require dealing with examples with differen...
With a direct analysis of neural networks, this paper presents a mathematically tight generalization...
Data-driven representations achieve powerful generalization performance in diverse information proce...
Machine learning models are typically configured by minimizing the training error over a given train...
By making assumptions on the probability distribution of the potentials in a feed-forward neural net...
The performance decay experienced by deep neural networks (DNNs) when confronted with distributional...
Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e....
In this paper, we study the OOD generalization of neural algorithmic reasoning tasks, where the goal...
Algorithmic generalization in machine learning refers to the ability to learn the underlying algorit...
Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine lea...
This paper develops a novel methodology to simultaneously learn a neural network and extract general...
Most deep learning models fail to generalize in production. Indeed, sometimes data used during train...
Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neu...
We present a unified framework for a number of different ways of failing to generalize properly. Dur...
Generalization is a central aspect of learning theory. Here, we propose a framework that explores an...
Can transformers generalize efficiently on problems that require dealing with examples with differen...
With a direct analysis of neural networks, this paper presents a mathematically tight generalization...
Data-driven representations achieve powerful generalization performance in diverse information proce...
Machine learning models are typically configured by minimizing the training error over a given train...
By making assumptions on the probability distribution of the potentials in a feed-forward neural net...
The performance decay experienced by deep neural networks (DNNs) when confronted with distributional...
Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e....