Tuning the step size of stochastic gradient descent is tedious and error prone. This has motivated the development of methods that automatically adapt the step size using readily available information. In this paper, we consider the family of SPS (Stochastic gradient with a Polyak Stepsize) adaptive methods. These are methods that make use of gradient and loss value at the sampled points to adaptively adjust the step size. We first show that SPS and its recent variants can all be seen as extensions of the Passive-Aggressive methods applied to nonlinear problems. We use this insight to develop new variants of the SPS method that are better suited to nonlinear models. Our new variants are based on introducing a slack variable into the interpo...
In recent years several proposals for the step-size selection have largely improved the gradient me...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
A difficulty in using Simultaneous Perturbation Stochastics Approximation (SPSA) is its performance ...
Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochas...
The recently proposed stochastic Polyak stepsize (SPS) and stochastic linesearch (SLS) for SGD have ...
Stochastic gradient descent (SGD) is commonly used in solving finite sum optimization problems. The ...
We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ ...
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods...
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (S...
Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order me...
Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function...
International audienceIn view of a direct and simple improvement of vanilla SGD, this paper presents...
© Springer International Publishing AG 2016. The convergence of Stochastic Gradient Descent (SGD) us...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
A theoretical, and potentially also practical, problem with stochastic gradient descent is that traj...
In recent years several proposals for the step-size selection have largely improved the gradient me...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
A difficulty in using Simultaneous Perturbation Stochastics Approximation (SPSA) is its performance ...
Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochas...
The recently proposed stochastic Polyak stepsize (SPS) and stochastic linesearch (SLS) for SGD have ...
Stochastic gradient descent (SGD) is commonly used in solving finite sum optimization problems. The ...
We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ ...
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods...
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (S...
Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order me...
Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function...
International audienceIn view of a direct and simple improvement of vanilla SGD, this paper presents...
© Springer International Publishing AG 2016. The convergence of Stochastic Gradient Descent (SGD) us...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
A theoretical, and potentially also practical, problem with stochastic gradient descent is that traj...
In recent years several proposals for the step-size selection have largely improved the gradient me...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
A difficulty in using Simultaneous Perturbation Stochastics Approximation (SPSA) is its performance ...