Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN that interpolates its training data will typically improve its generalization performance. Explaining the mechanism behind this ``benign overfitting'' in deep networks remains an outstanding challenge. Here, we study the last hidden layer representations of various state-of-the-art convolutional neural networks and find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise. The number of such groups increases linearly with the width of the layer, but only if the width is above a critical value. We show tha...
We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt t...
The excellent real-world performance of deep neural networks has received increasing attention. Desp...
The width of a neural network matters since increasing the width will necessarily increase the model...
A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy ...
Representations learned by pre-training a neural network on a large dataset are increasingly used su...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. T...
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. T...
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. T...
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. T...
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, give...
Training with the true labels of a dataset as opposed to randomized labels leads to faster optimizat...
International audienceThis paper underlines a subtle property of batch-normalization (BN): Successiv...
The understanding of generalization in machine learning is in a state of flux. This is partly due to...
As deep neural networks grow in size, from thousands to millions to billions of weights, the perform...
We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt t...
The excellent real-world performance of deep neural networks has received increasing attention. Desp...
The width of a neural network matters since increasing the width will necessarily increase the model...
A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy ...
Representations learned by pre-training a neural network on a large dataset are increasingly used su...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. T...
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. T...
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. T...
Deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. T...
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, give...
Training with the true labels of a dataset as opposed to randomized labels leads to faster optimizat...
International audienceThis paper underlines a subtle property of batch-normalization (BN): Successiv...
The understanding of generalization in machine learning is in a state of flux. This is partly due to...
As deep neural networks grow in size, from thousands to millions to billions of weights, the perform...
We study the generalization behavior of transfer learning of deep neural networks (DNNs). We adopt t...
The excellent real-world performance of deep neural networks has received increasing attention. Desp...
The width of a neural network matters since increasing the width will necessarily increase the model...