Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each step of the training phase, a mini batch of samples is drawn from the training dataset and the weights of the neural network are adjusted according to the performance on this specific subset of examples. The mini-batch sampling procedure introduces a stochastic dynamics to the gradient descent, with a non-trivial state-dependent noise. We characterize the stochasticity of SGD and a recently-introduced variant, \emph{persistent} SGD, in a prototypical neural network model. In the under-parametrized regime, where the final training error is positive, the SGD dynamics reaches a stationary state and we define an effective temperature from the fluct...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
A theoretical, and potentially also practical, problem with stochastic gradient descent is that traj...
8 pages + appendix, 4 figuresInternational audienceWe analyze in a closed form the learning dynamics...
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used extensively to train artif...
Is Stochastic Gradient Descent (SGD) substantially different from Glauber dynamics? This is a fundam...
International audienceStochastic gradient descent (SGD) has been widely used in machine learning due...
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achiev...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...
We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight d...
In this thesis, we are concerned with the Stochastic Gradient Descent (SGD) algorithm. Specifically,...
The deep learning optimization community has observed how the neural networks generalization ability...
The gradient noise of SGD is considered to play a central role in the observed strong generalization...
Thesis (Ph.D.)--University of Washington, 2019Tremendous advances in large scale machine learning an...
Over decades, gradient descent has been applied to develop learning algorithm to train a neural netw...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
A theoretical, and potentially also practical, problem with stochastic gradient descent is that traj...
8 pages + appendix, 4 figuresInternational audienceWe analyze in a closed form the learning dynamics...
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used extensively to train artif...
Is Stochastic Gradient Descent (SGD) substantially different from Glauber dynamics? This is a fundam...
International audienceStochastic gradient descent (SGD) has been widely used in machine learning due...
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achiev...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...
We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight d...
In this thesis, we are concerned with the Stochastic Gradient Descent (SGD) algorithm. Specifically,...
The deep learning optimization community has observed how the neural networks generalization ability...
The gradient noise of SGD is considered to play a central role in the observed strong generalization...
Thesis (Ph.D.)--University of Washington, 2019Tremendous advances in large scale machine learning an...
Over decades, gradient descent has been applied to develop learning algorithm to train a neural netw...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
A theoretical, and potentially also practical, problem with stochastic gradient descent is that traj...
8 pages + appendix, 4 figuresInternational audienceWe analyze in a closed form the learning dynamics...