International audienceIn view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case. For doing so, one estimates curvature, based on a local quadratic model and using only noisy gradient approximations. One obtains a new stochastic first-order method (Step-Tuned SGD), enhanced by second-order information, which can be seen as a stochastic version of the classical Barzilai-Borwein method. Our theoretical results ensure almost sure convergence to the critical set and we provide convergence rates. Experiments on deep residual network training illustrate the favorable properties of our approach. For such networks we observe, during training, both a sudden drop of the loss a...
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep n...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
International audienceIn view of a direct and simple improvement of vanilla SGD, this paper presents...
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods...
Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochas...
Stochastic gradient descent is the method of choice for solving large-scale optimization problems in...
94 pages, 4 figuresThis paper proposes a thorough theoretical analysis of Stochastic Gradient Descen...
International audienceWe propose a general yet simple theorem describing the convergence of SGD unde...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
Learning representation from relative similarity comparisons, often called ordinal embedding, gains ...
Stochastic Gradient Descent (SGD) is a widely-used iterative algorithm for solving stochastic optimi...
Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function...
Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning ...
We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" lear...
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep n...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
International audienceIn view of a direct and simple improvement of vanilla SGD, this paper presents...
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods...
Recently, Loizou et al. (2021), proposed and analyzed stochastic gradient descent (SGD) with stochas...
Stochastic gradient descent is the method of choice for solving large-scale optimization problems in...
94 pages, 4 figuresThis paper proposes a thorough theoretical analysis of Stochastic Gradient Descen...
International audienceWe propose a general yet simple theorem describing the convergence of SGD unde...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
Learning representation from relative similarity comparisons, often called ordinal embedding, gains ...
Stochastic Gradient Descent (SGD) is a widely-used iterative algorithm for solving stochastic optimi...
Mini-batch stochastic gradient descent (SGD) and variants thereof approximate the objective function...
Stochastic gradient descent (SGD) holds as a classical method to build large scale machine learning ...
We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" lear...
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep n...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....