We develop the mathematical foundations of the stochastic modified equations (SME) framework for analyzing the dynamics of stochastic gradient algorithms, where the latter is approximated by a class of stochastic differential equations with small noise parameters. We prove that this approximation can be understood mathematically as a weak approximation, which leads to a number of precise and useful results on the approximations of stochastic gradient descent (SGD), momentum SGD and stochastic Nesterov's accelerated gradient method in the general setting of stochastic objectives. We also demonstrate through explicit calculations that this continuous-time approach can uncover important analytical insights into the stochastic gradient algorith...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its prop...
Stochastic gradient descent (SGD) is arguably the most important algorithm used in optimization prob...
Abstract: Stochastic gradient descent is an optimisation method that combines classical gradient des...
In this thesis we want to give a theoretical and practical introduction to stochastic gradient desce...
We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ ...
Stochastic approximation (SA) is a classical algorithm that has had since the early days a huge impa...
We develop a new continuous-time stochastic gradient descent method for optimizing over the stationa...
In this article, a family of SDEs are derived as a tool to understand the behavior of numerical opti...
94 pages, 4 figuresThis paper proposes a thorough theoretical analysis of Stochastic Gradient Descen...
Gradient-based optimization algorithms, in particular their stochastic counterparts, have become by ...
This papers presents an overview of gradient based methods for minimization of noisy functions. It i...
International audienceStochastic gradient descent (SGD) has been widely used in machine learning due...
International audienceIn this paper, a general stochastic optimization procedure is studied, unifyin...
ABSTRACT. This papers presents an overview of gradient based methods for minimization of noisy func-...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its prop...
Stochastic gradient descent (SGD) is arguably the most important algorithm used in optimization prob...
Abstract: Stochastic gradient descent is an optimisation method that combines classical gradient des...
In this thesis we want to give a theoretical and practical introduction to stochastic gradient desce...
We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ ...
Stochastic approximation (SA) is a classical algorithm that has had since the early days a huge impa...
We develop a new continuous-time stochastic gradient descent method for optimizing over the stationa...
In this article, a family of SDEs are derived as a tool to understand the behavior of numerical opti...
94 pages, 4 figuresThis paper proposes a thorough theoretical analysis of Stochastic Gradient Descen...
Gradient-based optimization algorithms, in particular their stochastic counterparts, have become by ...
This papers presents an overview of gradient based methods for minimization of noisy functions. It i...
International audienceStochastic gradient descent (SGD) has been widely used in machine learning due...
International audienceIn this paper, a general stochastic optimization procedure is studied, unifyin...
ABSTRACT. This papers presents an overview of gradient based methods for minimization of noisy func-...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its prop...
Stochastic gradient descent (SGD) is arguably the most important algorithm used in optimization prob...