preprintThe practical performance of online stochastic gradient descent algorithms is highly dependent on the chosen step size, which must be tediously hand-tuned in many applications. The same is true for more advanced variants of stochastic gradients, such as SAGA, SVRG, or AdaGrad. Here we propose to adapt the step size by performing a gradient descent on the step size itself, viewing the whole performance of the learning trajectory as a function of step size. Importantly, this adaptation can be computed online at little cost, without having to iterate backward passes over the full data
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Online learning methods for sequentially arriving data are growing in popularity. Alternative batch ...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
Gradient-based methods are often used for optimization. They form the basis of several neural networ...
Gradient-based methods are often used for optimization. They form the basis of several neural networ...
We present an online Support Vector Machine (SVM) that uses Stochastic Meta-Descent (SMD) to adapt i...
We present an online Support Vector Machine (SVM) that uses Stochastic Meta-Descent (SMD) to adapt i...
The article examines in some detail the convergence rate and mean-square-error performance of moment...
International audienceStochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its ...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
This paper examines the convergence rate and mean-square-error performance of momentum stochastic gr...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Online learning methods for sequentially arriving data are growing in popularity. Alternative batch ...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
Gradient-based methods are often used for optimization. They form the basis of several neural networ...
Gradient-based methods are often used for optimization. They form the basis of several neural networ...
We present an online Support Vector Machine (SVM) that uses Stochastic Meta-Descent (SMD) to adapt i...
We present an online Support Vector Machine (SVM) that uses Stochastic Meta-Descent (SMD) to adapt i...
The article examines in some detail the convergence rate and mean-square-error performance of moment...
International audienceStochastic Gradient Descent (SGD) is a workhorse in machine learning, yet its ...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
This paper examines the convergence rate and mean-square-error performance of momentum stochastic gr...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Online learning methods for sequentially arriving data are growing in popularity. Alternative batch ...