Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions. In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks. Here, we derive an exact expression for the natural gradient in deep linear networks, which exhibit pathological curvature similar to the nonlinear case. We provide for the first time an analytical solution for its convergence rate, showing that the loss...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
When a parameter space has a certain underlying structure, the ordinary gradient of a function does ...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...
Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limita...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
The deep learning optimization community has observed how the neural networks generalization ability...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Natural gradient descent (NGD) is an on-line algorithm for redefining the steepest descent direction...
While natural gradients have been widely studied from both theoretical and empirical perspectives, w...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Neumann K, Steil JJ. Intrinsic Plasticity via Natural Gradient Decent. In: Verleysen M, ed. 20th Eur...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
When a parameter space has a certain underlying structure, the ordinary gradient of a function does ...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...
Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limita...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
The deep learning optimization community has observed how the neural networks generalization ability...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Natural gradient descent (NGD) is an on-line algorithm for redefining the steepest descent direction...
While natural gradients have been widely studied from both theoretical and empirical perspectives, w...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Neumann K, Steil JJ. Intrinsic Plasticity via Natural Gradient Decent. In: Verleysen M, ed. 20th Eur...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
When a parameter space has a certain underlying structure, the ordinary gradient of a function does ...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...