© 2018 Curran Associates Inc..All rights reserved. Classically, the time complexity of a first-order method is estimated by its number of gradient computations. In this paper, we study a more refined complexity by taking into account the “lingering” of gradients: once a gradient is computed at xk, the additional time to compute gradients at xk+1, xk+2, . . . may be reduced. We show how this improves the running time of gradient descent and SVRG. For instance, if the “additional time” scales linearly with respect to the traveled distance, then the “convergence rate” of gradient descent can be improved from 1/T to exp(−T1/3). On the empirical side, we solve a hypothetical revenue management problem on the Yahoo! Front Page Today Module applic...
For smooth and strongly convex optimizations, the optimal iteration complexity of the gradient-based...
The integration to steady state of many initial value ODEs and PDEs using the forward Euler method c...
Online learning algorithms require to often recompute least squares regression estimates of paramete...
© 2018 Curran Associates Inc..All rights reserved. Classically, the time complexity of a first-order...
Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order me...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) ...
We present and computationally evaluate a variant of the fast gradient method by Nesterov that is ca...
Abstract. During the last few decades, several papers were published about second-order opti-mizatio...
We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when lo...
International audienceStochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its ...
Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those met...
The performance of stochastic gradient de-scent (SGD) depends critically on how learn-ing rates are ...
The Inexact Gradient Method with Memory (IGMM) is able to considerably outperform the Gradient Metho...
Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An import...
For smooth and strongly convex optimizations, the optimal iteration complexity of the gradient-based...
The integration to steady state of many initial value ODEs and PDEs using the forward Euler method c...
Online learning algorithms require to often recompute least squares regression estimates of paramete...
© 2018 Curran Associates Inc..All rights reserved. Classically, the time complexity of a first-order...
Noise is inherited in many optimization methods such as stochastic gradient methods, zeroth-order me...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) ...
We present and computationally evaluate a variant of the fast gradient method by Nesterov that is ca...
Abstract. During the last few decades, several papers were published about second-order opti-mizatio...
We study a scalable alternative to robust gradient descent (RGD) techniques that can be used when lo...
International audienceStochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its ...
Interpreting gradient methods as fixed-point iterations, we provide a detailed analysis of those met...
The performance of stochastic gradient de-scent (SGD) depends critically on how learn-ing rates are ...
The Inexact Gradient Method with Memory (IGMM) is able to considerably outperform the Gradient Metho...
Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An import...
For smooth and strongly convex optimizations, the optimal iteration complexity of the gradient-based...
The integration to steady state of many initial value ODEs and PDEs using the forward Euler method c...
Online learning algorithms require to often recompute least squares regression estimates of paramete...