We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (SGD), where the step sizes change based on observed stochastic gradients, for minimizing non-convex, smooth objectives. Despite their popularity, the analysis of adaptive SGD lags behind that of non adaptive methods in this setting. Specifically, all prior works rely on some subset of the following assumptions: (i) uniformly-bounded gradient norms, (ii) uniformly-bounded stochastic gradient variance (or even noise support), (iii) conditional independence between the step size and stochastic gradient. In this work, we show that AdaGrad-Norm exhibits an order optimal convergence rate of $\mathcal{O}\left(\frac{\mathrm{poly}\log(T)}{\sqrt{T}}\rig...
We consider the minimization of an objective function given access to unbiased estimates of its grad...
AbstractWe propose a new adaptive algorithm with decreasing step-size for stochastic approximations....
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochas...
We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ ...
Journal ArticleThis paper presents two adaptive step-size gradient adaptive filters. The step sizes ...
In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \...
Journal ArticleAbstract-This paper presents an adaptive step-size gradient adaptive filter. The step...
In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solvin...
The recently proposed stochastic Polyak stepsize (SPS) and stochastic linesearch (SLS) for SGD have ...
Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function...
Tuning the step size of stochastic gradient descent is tedious and error prone. This has motivated t...
We consider the optimization of a smooth and strongly convex objective using constant step-size stoc...
We consider the minimization of an objective function given access to unbiased estimates of its grad...
AbstractWe propose a new adaptive algorithm with decreasing step-size for stochastic approximations....
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochas...
We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ ...
Journal ArticleThis paper presents two adaptive step-size gradient adaptive filters. The step sizes ...
In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \...
Journal ArticleAbstract-This paper presents an adaptive step-size gradient adaptive filter. The step...
In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solvin...
The recently proposed stochastic Polyak stepsize (SPS) and stochastic linesearch (SLS) for SGD have ...
Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function...
Tuning the step size of stochastic gradient descent is tedious and error prone. This has motivated t...
We consider the optimization of a smooth and strongly convex objective using constant step-size stoc...
We consider the minimization of an objective function given access to unbiased estimates of its grad...
AbstractWe propose a new adaptive algorithm with decreasing step-size for stochastic approximations....
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...