ADADELTA: an adaptive learning rate method

Matthew D. Zeiler

Publication date

January 2012

Abstract

We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynami-cally adapts over time using only first order information and has minimal computational overhead beyond vanilla stochas-tic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient informa-tion, different model architecture choices, various data modal-ities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit clas-sification task using a single machine and on a large scale voice dataset in a distributed cluster environment

Extracted data

We use cookies to provide a better user experience.

Data Protection

ADADELTA: an adaptive learning rate method

Abstract

Extracted data

ADADELTA: an adaptive learning rate method

Abstract

Extracted data

Related items

Related items