Overparametrization is a key factor in the absence of convexity to explain global convergence of gradient descent (GD) for neural networks. Beside the well studied lazy regime, infinite width (mean field) analysis has been developed for shallow networks, using on convex optimization technics. To bridge the gap between the lazy and mean field regimes, we study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear. Such ResNets admit both infinite depth and width limits, encoding residual blocks in a Reproducing Kernel Hilbert Space (RKHS). In this limit, we prove a local Polyak-Lojasiewicz inequality. Thus, every critical point is a global minimizer and a local convergence result of GD...
We study the overparametrization bounds required for the global convergence of stochastic gradient d...
International audienceIn this paper, we present a new strategy to prove the convergence of deep lear...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
Overparametrization is a key factor in the absence of convexity to explain global convergence of gra...
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number...
Deep learning has become an important toolkit for data science and artificial intelligence. In contr...
International audienceMany supervised machine learning methods are naturally cast as optimization pr...
Various powerful deep neural network architectures have made great contribution to the exciting succ...
We prove linear convergence of gradient descent to a global minimum for the training of deep residua...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical de...
Neural networks have been very successful in many applications; we often, however, lack a theoretica...
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks wit...
Recent works have shown that gradient descent can find a global minimum for over-parameterized neura...
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gra...
We study the overparametrization bounds required for the global convergence of stochastic gradient d...
International audienceIn this paper, we present a new strategy to prove the convergence of deep lear...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
Overparametrization is a key factor in the absence of convexity to explain global convergence of gra...
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number...
Deep learning has become an important toolkit for data science and artificial intelligence. In contr...
International audienceMany supervised machine learning methods are naturally cast as optimization pr...
Various powerful deep neural network architectures have made great contribution to the exciting succ...
We prove linear convergence of gradient descent to a global minimum for the training of deep residua...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical de...
Neural networks have been very successful in many applications; we often, however, lack a theoretica...
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks wit...
Recent works have shown that gradient descent can find a global minimum for over-parameterized neura...
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gra...
We study the overparametrization bounds required for the global convergence of stochastic gradient d...
International audienceIn this paper, we present a new strategy to prove the convergence of deep lear...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...