Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep neural networks, stochastic gradient descent (SGD) often incurs steady progression phases, interrupted by catastrophic episodes in which loss and gradient norm explode. A possible mitigation of such events is to slow down the learning process. This paper presents a novel approach, called SALERA, to control the SGD learning rate, that uses two statistical tests. The first one, aimed at fast learning, compares the momentum of the normalized gradient vectors to that of random unit vectors and accordingly gracefully increases or decreases the learning rate. The second one is a change point detection test, aimed at the detection of catastrophic l...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
The performance of stochastic gradient de-scent (SGD) depends critically on how learn-ing rates are ...
Thesis (Ph.D.)--University of Washington, 2019Tremendous advances in large scale machine learning an...
Recent work has established an empirically successful framework for adapting learning rates for stoc...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization pr...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep...
The performance of stochastic gradient de-scent (SGD) depends critically on how learn-ing rates are ...
Thesis (Ph.D.)--University of Washington, 2019Tremendous advances in large scale machine learning an...
Recent work has established an empirically successful framework for adapting learning rates for stoc...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
Stochastic gradient descent (SGD) is a simple and popular method to solve stochastic optimization pr...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...
Stochastic Gradient Descent algorithms (SGD) remain a popular optimizer for deep learning networks a...
We propose a new per-layer adaptive step-size procedure for stochastic first-order optimization meth...
preprintThe practical performance of online stochastic gradient descent algorithms is highly depende...