Adaptive gradient methods have shown excellent performances for solving many machine learning problems. Although multiple adaptive gradient methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for specific problems by using some specific adaptive learning rates. Thus, it is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To fill this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can flexibly integrate the momentum and variance redu...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
This thesis aims at developing efficient optimization algorithms for solving large-scale machine lea...
Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning ...
In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solvin...
Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex opt...
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective ...
In this paper we propose several adaptive gradient methods for stochastic optimization. Our methods ...
© 2019 International Machine Learning Society (IMLS). Adaptive methods such as Adam and RMSProp are ...
We present a novel method for convex unconstrained optimization that, without any modifications, ens...
AdaBelief, one of the current best optimizers, demonstrates superior generalization ability over the...
Adaptive Moment Estimation (Adam), which combines Adaptive Learning Rate and Momentum, would be the ...
Stochastic gradient descent is the method of choice for solving large-scale optimization problems in...
Nesterov's accelerated gradient (AG) is a popular technique to optimize objective functions comprisi...
In this paper, we investigate the convergence properties of a wide class of Adam-family methods for ...
We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ ...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
This thesis aims at developing efficient optimization algorithms for solving large-scale machine lea...
Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning ...
In the paper, we propose a class of faster adaptive Gradient Descent Ascent (GDA) methods for solvin...
Distributed adaptive stochastic gradient methods have been widely used for large-scale nonconvex opt...
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective ...
In this paper we propose several adaptive gradient methods for stochastic optimization. Our methods ...
© 2019 International Machine Learning Society (IMLS). Adaptive methods such as Adam and RMSProp are ...
We present a novel method for convex unconstrained optimization that, without any modifications, ens...
AdaBelief, one of the current best optimizers, demonstrates superior generalization ability over the...
Adaptive Moment Estimation (Adam), which combines Adaptive Learning Rate and Momentum, would be the ...
Stochastic gradient descent is the method of choice for solving large-scale optimization problems in...
Nesterov's accelerated gradient (AG) is a popular technique to optimize objective functions comprisi...
In this paper, we investigate the convergence properties of a wide class of Adam-family methods for ...
We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ ...
We aim to make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochas...
This thesis aims at developing efficient optimization algorithms for solving large-scale machine lea...
Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning ...