In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic optimization. AdaCN dynamically captures the curvature of the loss landscape by diagonally approximated Hessian plus the norm of difference between previous two estimates. It only requires at most first order gradients and updates with linear complexity for both time and memory. In order to reduce the variance introduced by the stochastic nature of the problem, AdaCN hires the first and second moment to implement and exponential moving average on iteratively updated stochastic gradients and approximated stochastic Hessians, respectively. We validate AdaCN in extensive experiments, showing that it outperforms other stochastic first order methods (...
Incorporating curvature information in stochastic methods has been a challenging task. This paper pr...
In this article, we present three smoothed functional (SF) algorithms for simulation optimization.Wh...
International audienceIn view of a direct and simple improvement of vanilla SGD, this paper presents...
Stochastic gradient descent is the method of choice for solving large-scale optimization problems in...
We study stochastic Cubic Newton methods for solving general possibly non-convex minimization proble...
National audienceThis paper studies some asymptotic properties of adaptive algorithms widely used in...
Incorporating second-order curvature information into machine learning optimization algorithms can b...
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective ...
Gradient-based optimization algorithms, in particular their stochastic counterparts, have become by ...
abstract: This thesis presents a family of adaptive curvature methods for gradient-based stochastic ...
We provide new adaptive first-order methods for constrained convex optimization. Our main algorithms...
Recent work has established an empirically successful framework for adapting learning rates for stoc...
We consider the fundamental problem in nonconvex optimization of efficiently reaching a stationary p...
We present new algorithms for simulation optimization using random directions stochastic approximati...
While first-order methods are popular for solving optimization problems that arise in large-scale de...
Incorporating curvature information in stochastic methods has been a challenging task. This paper pr...
In this article, we present three smoothed functional (SF) algorithms for simulation optimization.Wh...
International audienceIn view of a direct and simple improvement of vanilla SGD, this paper presents...
Stochastic gradient descent is the method of choice for solving large-scale optimization problems in...
We study stochastic Cubic Newton methods for solving general possibly non-convex minimization proble...
National audienceThis paper studies some asymptotic properties of adaptive algorithms widely used in...
Incorporating second-order curvature information into machine learning optimization algorithms can b...
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective ...
Gradient-based optimization algorithms, in particular their stochastic counterparts, have become by ...
abstract: This thesis presents a family of adaptive curvature methods for gradient-based stochastic ...
We provide new adaptive first-order methods for constrained convex optimization. Our main algorithms...
Recent work has established an empirically successful framework for adapting learning rates for stoc...
We consider the fundamental problem in nonconvex optimization of efficiently reaching a stationary p...
We present new algorithms for simulation optimization using random directions stochastic approximati...
While first-order methods are popular for solving optimization problems that arise in large-scale de...
Incorporating curvature information in stochastic methods has been a challenging task. This paper pr...
In this article, we present three smoothed functional (SF) algorithms for simulation optimization.Wh...
International audienceIn view of a direct and simple improvement of vanilla SGD, this paper presents...