Exact natural gradient in deep linear networks and application to the nonlinear case

Bernacchia, Alberto
Lengyel, Mate
Hennequin, Guillaume

Open PDF

Open link

Publication date

January 2018

DOI

10.17863/CAM.35433

Publisher

32nd Conference on Neural Information Processing Systems (NIPS 2018), Montréal, Canada

Abstract

Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions. In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks. Here, we derive an exact expression for the natural gradient in deep linear networks, which exhibit pathological curvature similar to the nonlinear case. We provide for the first time an analytical solution for its convergence rate, showing that the loss...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Exact natural gradient in deep linear networks and application to the nonlinear case

Abstract

Extracted data

Exact natural gradient in deep linear networks and application to the nonlinear case

Abstract

Extracted data

Related items

Related items