We prove in this paper that optimizing wide ReLU neural networks (NNs) with at least one hidden layer using l2-regularization on the parameters enforces multi-task learning due to representation-learning -- also in the limit of width to infinity. This is in contrast to multiple other results in the literature, in which idealized settings are assumed and where wide (ReLU)-NNs loose their ability to benefit from multi-task learning in the infinite width limit. We deduce the ability of multi-task learning from proving an exact quantitative macroscopic characterization of the learned NN in an appropriate function space
International audienceMany supervised machine learning methods are naturally cast as optimization pr...
As its width tends to infinity, a deep neural network's behavior under gradient descent can become s...
We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and num...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $...
We contribute to a better understanding of the class of functions that is represented by a neural ne...
© 2019 Neural information processing systems foundation. All rights reserved. We study finite sample...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
We contribute to a better understanding of the class of functions that can be represented by a neura...
It took until the last decade to finally see a machine match human performance on essentially any ta...
By applying concepts from the statistical physics of learning, we study layered neural networks of r...
Neural networks (NNs) have seen a surge in popularity due to their unprecedented practical success i...
International audienceGiven a training set, a loss function, and a neural network architecture, it i...
A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy ...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
International audienceMany supervised machine learning methods are naturally cast as optimization pr...
As its width tends to infinity, a deep neural network's behavior under gradient descent can become s...
We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and num...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $...
We contribute to a better understanding of the class of functions that is represented by a neural ne...
© 2019 Neural information processing systems foundation. All rights reserved. We study finite sample...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
We contribute to a better understanding of the class of functions that can be represented by a neura...
It took until the last decade to finally see a machine match human performance on essentially any ta...
By applying concepts from the statistical physics of learning, we study layered neural networks of r...
Neural networks (NNs) have seen a surge in popularity due to their unprecedented practical success i...
International audienceGiven a training set, a loss function, and a neural network architecture, it i...
A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy ...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
International audienceMany supervised machine learning methods are naturally cast as optimization pr...
As its width tends to infinity, a deep neural network's behavior under gradient descent can become s...
We study the necessary and sufficient complexity of ReLU neural networks---in terms of depth and num...