The stochastic gradient method is currently the prevailing technology for training neural networks. Compared to the classical gradient descent method, the calculation of the true gradient as an average over the data is replaced by a random element of the sum. When dealing with massive data, this bold approximation enables one to decrease the number of elementary gradient evaluations and to alleviate the cost of each iteration. The price to be paid is the appearance of oscillations and the slowness of convergence, which is often excessive in terms of number of iterations. The aim of this thesis is to design an approach that is both: (i) more robust, using the fundamental methods that have been successfully proven in classical optimization, i...
Deep neural networks (DNNs) are currently predominantly trained using first-order methods. Some of t...
Which numerical methods are ideal for training a neural network? In this report four different optim...
The notable changes over the current version: - worked example of convergence rates showing SAG can ...
We design four novel approximations of the Fisher Information Matrix (FIM) that plays a central role...
Second-order optimization methods have the ability to accelerate convergence by modifying the gradie...
Several studies have shown the ability of natural gradient descent to minimize the objective functio...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
Second-order optimization methods applied to train deep neural net- works use the curvature informat...
For a long time, second-order optimization methods have been regarded as computationally inefficient...
The current scalable Bayesian methods for Deep Neural Networks (DNNs) often rely on the Fisher Infor...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
Second-order optimizers are thought to hold the potential to speed up neural network training, but d...
Les modèles de réseaux de neurones sont devenus extrêmement répandus ces dernières années en raison ...
We report research results on the training and use of an artificial neural network for the precondit...
Deep Learning learning has recently become one of the most predominantly used techniques in the fiel...
Deep neural networks (DNNs) are currently predominantly trained using first-order methods. Some of t...
Which numerical methods are ideal for training a neural network? In this report four different optim...
The notable changes over the current version: - worked example of convergence rates showing SAG can ...
We design four novel approximations of the Fisher Information Matrix (FIM) that plays a central role...
Second-order optimization methods have the ability to accelerate convergence by modifying the gradie...
Several studies have shown the ability of natural gradient descent to minimize the objective functio...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
Second-order optimization methods applied to train deep neural net- works use the curvature informat...
For a long time, second-order optimization methods have been regarded as computationally inefficient...
The current scalable Bayesian methods for Deep Neural Networks (DNNs) often rely on the Fisher Infor...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
Second-order optimizers are thought to hold the potential to speed up neural network training, but d...
Les modèles de réseaux de neurones sont devenus extrêmement répandus ces dernières années en raison ...
We report research results on the training and use of an artificial neural network for the precondit...
Deep Learning learning has recently become one of the most predominantly used techniques in the fiel...
Deep neural networks (DNNs) are currently predominantly trained using first-order methods. Some of t...
Which numerical methods are ideal for training a neural network? In this report four different optim...
The notable changes over the current version: - worked example of convergence rates showing SAG can ...