In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order information whose computational complexity is in between the computation of the gradient and the one of the Hessian-vector product. If (1C,1M) represents respectively the computational time and memory footprint of the gradient method, the new technique increase the overall cost to either (1.5C,2M) or (2C,1M). This rescaling has the appealing characteristic of having a natural interpretation, it allows the practitioner to choose between exploration of the parameters set and convergence of the algorithm. The resca...
We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer...
Fractional calculus is an emerging topic in artificial neural network training, especially when usin...
Incorporating second-order curvature information into machine learning optimization algorithms can b...
In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate ...
Second-order optimization methods applied to train deep neural net- works use the curvature informat...
The deep learning community has devised a diverse set of methods to make gradient optimization, usin...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
In this paper, we incorporate the Barzilai-Borwein step size into gradient descent methods used to t...
© 2017 IEEE. Training deep neural networks is difficult for the pathological curvature problem. Re-p...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
We propose a fast second-order method that can be used as a drop-in replacement for current deep lea...
Backpropagation using Stochastic Diagonal Approximate Greatest Descent (SDAGD) is a novel adaptive s...
In modern supervised learning, many deep neural networks are able to interpolate the data: the empir...
The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial ex...
We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer...
Fractional calculus is an emerging topic in artificial neural network training, especially when usin...
Incorporating second-order curvature information into machine learning optimization algorithms can b...
In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate ...
Second-order optimization methods applied to train deep neural net- works use the curvature informat...
The deep learning community has devised a diverse set of methods to make gradient optimization, usin...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
In this paper, we incorporate the Barzilai-Borwein step size into gradient descent methods used to t...
© 2017 IEEE. Training deep neural networks is difficult for the pathological curvature problem. Re-p...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
We propose a fast second-order method that can be used as a drop-in replacement for current deep lea...
Backpropagation using Stochastic Diagonal Approximate Greatest Descent (SDAGD) is a novel adaptive s...
In modern supervised learning, many deep neural networks are able to interpolate the data: the empir...
The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial ex...
We analyse the dynamics of a number of second order on-line learning algorithms training multi-layer...
Fractional calculus is an emerging topic in artificial neural network training, especially when usin...
Incorporating second-order curvature information into machine learning optimization algorithms can b...