SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of $\Omega(\frac{\ln T}{\sqrt{T}})$ after $T$ iterations. Based on ...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
The article examines in some detail the convergence rate and mean-square-error performance of moment...
The vast majority of convergence rates analysis for stochastic gradient methods in the literature fo...
Recently, Stochastic Gradient Descent (SGD) and its variants have become the dominant methods in the...
Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning ...
The stochastic momentum method is a commonly used acceleration technique for solving large-scale sto...
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings with...
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings with...
The momentum acceleration technique is widely adopted in many optimization algorithms. However, ther...
In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class...
Stochastic Gradient Descent (SGD) is the workhorse for training large-scale machine learning applica...
Momentum methods have been shown to accelerate the convergence of the standard gradient descent algo...
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Stochastic mo...
Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale o...
We study the convergence of accelerated stochastic gradient descent for strongly convex objectives u...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
The article examines in some detail the convergence rate and mean-square-error performance of moment...
The vast majority of convergence rates analysis for stochastic gradient methods in the literature fo...
Recently, Stochastic Gradient Descent (SGD) and its variants have become the dominant methods in the...
Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning ...
The stochastic momentum method is a commonly used acceleration technique for solving large-scale sto...
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings with...
Momentum is known to accelerate the convergence of gradient descent in strongly convex settings with...
The momentum acceleration technique is widely adopted in many optimization algorithms. However, ther...
In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class...
Stochastic Gradient Descent (SGD) is the workhorse for training large-scale machine learning applica...
Momentum methods have been shown to accelerate the convergence of the standard gradient descent algo...
© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Stochastic mo...
Stochastic gradient descent (SGD) and its variants are the main workhorses for solving large-scale o...
We study the convergence of accelerated stochastic gradient descent for strongly convex objectives u...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
The article examines in some detail the convergence rate and mean-square-error performance of moment...
The vast majority of convergence rates analysis for stochastic gradient methods in the literature fo...