Is it possible for a first-order method, i.e., only first derivatives allowed, to be quadratically convergent? For univariate loss functions, the answer is yes -- the Steffensen method avoids second derivatives and is still quadratically convergent like Newton method. By incorporating an optimal step size we can even push its convergence order beyond quadratic to $1+\sqrt{2} \approx 2.414$. While such high convergence orders are a pointless overkill for a deterministic algorithm, they become rewarding when the algorithm is randomized for problems of massive sizes, as randomization invariably compromises convergence speed. We will introduce two adaptive learning rates inspired by the Steffensen method, intended for use in a stochastic optimi...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
This thesis is concerned with stochastic optimization methods. The pioneering work in the field is t...
We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not ma...
The notable changes over the current version: - worked example of convergence rates showing SAG can ...
In this paper, a family of Steffensen type methods of fourth-order convergence for solving nonlinear...
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (S...
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, ...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appea...
Optimization is important in machine learning problems, and quasi-Newton methods have a reputation a...
Stochastic Gradient (SG) is the defacto iterative technique to solve stochastic optimization (SO) pr...
We propose a new globally convergent stochastic second order method. Our starting point is the devel...
We study stochastic Cubic Newton methods for solving general possibly non-convex minimization proble...
While first-order methods are popular for solving optimization problems that arise in large-scale de...
This paper introduces PROMISE ($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
This thesis is concerned with stochastic optimization methods. The pioneering work in the field is t...
We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not ma...
The notable changes over the current version: - worked example of convergence rates showing SAG can ...
In this paper, a family of Steffensen type methods of fourth-order convergence for solving nonlinear...
We study convergence rates of AdaGrad-Norm as an exemplar of adaptive stochastic gradient methods (S...
This work proposes a universal and adaptive second-order method for minimizing second-order smooth, ...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
Trust-region (TR) and adaptive regularization using cubics (ARC) have proven to have some very appea...
Optimization is important in machine learning problems, and quasi-Newton methods have a reputation a...
Stochastic Gradient (SG) is the defacto iterative technique to solve stochastic optimization (SO) pr...
We propose a new globally convergent stochastic second order method. Our starting point is the devel...
We study stochastic Cubic Newton methods for solving general possibly non-convex minimization proble...
While first-order methods are popular for solving optimization problems that arise in large-scale de...
This paper introduces PROMISE ($\textbf{Pr}$econditioned Stochastic $\textbf{O}$ptimization $\textbf...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
This thesis is concerned with stochastic optimization methods. The pioneering work in the field is t...
We develop a line-search second-order algorithmic framework for minimizing finite sums. We do not ma...