Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limitations arising for ill-behaved objective functions. In cases where it could be estimated, the natural gradient has proven very effective at mitigating the catastrophic effects of pathological curvature in the objective function, but little is known theoretically about its convergence properties, and it has yet to find a practical implementation that would scale to very deep and large networks. Here, we derive an exact expression for the natural gradient in deep linear networks, which exhibit pathological curvature similar to the nonlinear case. We provide for the first time an analytical solution for its convergence rate, showing that the loss...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
Modern supervised learning techniques, particularly those using deep nets, involve fitting high dime...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limita...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
The deep learning optimization community has observed how the neural networks generalization ability...
Natural gradient descent (NGD) is an on-line algorithm for redefining the steepest descent direction...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
While natural gradients have been widely studied from both theoretical and empirical perspectives, w...
Neumann K, Steil JJ. Intrinsic Plasticity via Natural Gradient Decent. In: Verleysen M, ed. 20th Eur...
When a parameter space has a certain underlying structure, the ordinary gradient of a function does ...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
Modern supervised learning techniques, particularly those using deep nets, involve fitting high dime...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
Stochastic gradient descent (SGD) remains the method of choice for deep learning, despite the limita...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
The deep learning optimization community has observed how the neural networks generalization ability...
Natural gradient descent (NGD) is an on-line algorithm for redefining the steepest descent direction...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
While natural gradients have been widely studied from both theoretical and empirical perspectives, w...
Neumann K, Steil JJ. Intrinsic Plasticity via Natural Gradient Decent. In: Verleysen M, ed. 20th Eur...
When a parameter space has a certain underlying structure, the ordinary gradient of a function does ...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
In stochastic gradient descent (SGD) and its variants, the optimized gradient estimators may be as e...
Modern supervised learning techniques, particularly those using deep nets, involve fitting high dime...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...