Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

Sebag, Alice, Schoenauer
Schoenauer, Marc
Sebag, Michèle

Publication date

December 2017

Publisher

HAL CCSD

Abstract

Short version of https://arxiv.org/abs/1709.01427International audienceWhen applied to training deep neural networks, stochastic gradient descent (SGD) often incurs steady progression phases, interrupted by catastrophic episodes in which loss and gradient norm explode. A possible mitigation of such events is to slow down the learning process. This paper presents a novel approach, called SALERA, to control the SGD learning rate, that uses two statistical tests. The first one, aimed at fast learning, compares the momentum of the normalized gradient vectors to that of random unit vectors and accordingly gracefully increases or decreases the learning rate. The second one is a change point detection test, aimed at the detection of catastrophic l...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

Abstract

Extracted data

Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

Abstract

Extracted data

Related items

Related items