Stochastic Gradient Descent (SGD) is the workhorse for training large-scale machine learning applications. Although the convergence rate of its deterministic counterpart, Gradient Descent (GD), can be shown to be accelerated by adaptations that use the notion of momentum, e.g., Heavy Ball (HB) or Nesterov Accelerated Gradient (NAG), the theory could not prove, by means of local convergence analysis, that such modifications provide faster convergence rates in the stochastic setting. This work empirically establishes that a positive momentum coefficient in SGD has the effect of enlarging the algorithm's learning rate, not contributing to a boost in performance per se. For the deep learning setting, however, this enlargement tends to be conduc...
Large-scale learning problems require algorithms that scale benignly with respect to the size of the...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machi...
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings with...
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings with...
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Stochastic mo...
Recently, Stochastic Gradient Descent (SGD) and its variants have become the dominant methods in the...
Thesis (Ph.D.)--University of Washington, 2019Tremendous advances in large scale machine learning an...
This paper examines the convergence rate and mean-square-error performance of momentum stochastic gr...
The article examines in some detail the convergence rate and mean-square-error performance of moment...
Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning ...
Momentum based learning algorithms are one of the most successful learning algorithms in both convex...
Gradient decent-based optimization methods underpin the parameter training which results in the impr...
Gradient decent-based optimization methods underpin the parameter training which results in the impr...
In the age of artificial intelligence, the best approach to handling huge amounts of data is a treme...
Large-scale learning problems require algorithms that scale benignly with respect to the size of the...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machi...
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings with...
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings with...
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Stochastic mo...
Recently, Stochastic Gradient Descent (SGD) and its variants have become the dominant methods in the...
Thesis (Ph.D.)--University of Washington, 2019Tremendous advances in large scale machine learning an...
This paper examines the convergence rate and mean-square-error performance of momentum stochastic gr...
The article examines in some detail the convergence rate and mean-square-error performance of moment...
Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning ...
Momentum based learning algorithms are one of the most successful learning algorithms in both convex...
Gradient decent-based optimization methods underpin the parameter training which results in the impr...
Gradient decent-based optimization methods underpin the parameter training which results in the impr...
In the age of artificial intelligence, the best approach to handling huge amounts of data is a treme...
Large-scale learning problems require algorithms that scale benignly with respect to the size of the...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machi...