International audienceIn a series of recent theoretical works, it was shown that strongly over-parameterized neural networks trained with gradient-based methods could converge exponentially fast to zero training loss, with their parameters hardly varying. In this work, we show that this "lazy training" phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels. Through a theoretical analysis, we exhibit various situations where this phenomenon arises in non-convex optimization and we provide bounds on the distance between the lazy and...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
International audienceIn a series of recent theoretical works, it was shown that strongly over-param...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
In the training of over-parameterized model functions via gradient descent, sometimes the parameters...
The remarkable practical success of deep learning has revealed some major surprises from a theoretic...
Neural networks trained via gradient descent with random initialization and without any regularizati...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Recent works show that random neural networks are vulnerable against adversarial attacks [Daniely an...
Among attempts at giving a theoretical account of the success of deep neural networks, a recent line...
Conventional wisdom in deep learning states that increasing depth improves expressiveness but compli...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
International audienceIn a series of recent theoretical works, it was shown that strongly over-param...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
In the training of over-parameterized model functions via gradient descent, sometimes the parameters...
The remarkable practical success of deep learning has revealed some major surprises from a theoretic...
Neural networks trained via gradient descent with random initialization and without any regularizati...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Recent works show that random neural networks are vulnerable against adversarial attacks [Daniely an...
Among attempts at giving a theoretical account of the success of deep neural networks, a recent line...
Conventional wisdom in deep learning states that increasing depth improves expressiveness but compli...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...