Recently, we proposed to transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero output and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. We con-tinue the work by firstly introducing a third transformation to normalize the scale of the outputs of each hidden neuron, and secondly by analyzing the connections to second order optimization methods. We show that the transformations make a simple stochastic gradient behave closer to second-order optimization methods and thus speed up learning. This is shown both in theory and with experiments. The experiments on the third transformation show that while it further increases the speed of learning, it can a...
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a...
Gradient-following learning methods can encounter problems of implementation in many applications, ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Abstract. Recently, we proposed to transform the outputs of each hidden neu-ron in a multi-layer per...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
Second-order optimizers are thought to hold the potential to speed up neural network training, but d...
This paper proposes an improved stochastic second order learning algorithm for supervised neural net...
When a parameter space has a certain underlying structure, the ordinary gradient of a function does ...
Understanding intelligence and how it allows humans to learn, to make decision and form memories, is...
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z; w)) ...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
We investigate a new approach to compute the gradients of artificial neural networks (ANNs), based o...
We propose a fast second-order method that can be used as a drop-in replacement for current deep lea...
Gradient-following learning methods can encounter problems of implementation in many applications, a...
Rapid advances in data collection and processing capabilities have allowed for the use of increasing...
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a...
Gradient-following learning methods can encounter problems of implementation in many applications, ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Abstract. Recently, we proposed to transform the outputs of each hidden neu-ron in a multi-layer per...
Neural networks are an important class of highly flexible and powerful models inspired by the struct...
Second-order optimizers are thought to hold the potential to speed up neural network training, but d...
This paper proposes an improved stochastic second order learning algorithm for supervised neural net...
When a parameter space has a certain underlying structure, the ordinary gradient of a function does ...
Understanding intelligence and how it allows humans to learn, to make decision and form memories, is...
Many connectionist learning algorithms consists of minimizing a cost of the form C(w) = E(J(z; w)) ...
In this dissertation, we are concerned with the advancement of optimization algorithms for training ...
We investigate a new approach to compute the gradients of artificial neural networks (ANNs), based o...
We propose a fast second-order method that can be used as a drop-in replacement for current deep lea...
Gradient-following learning methods can encounter problems of implementation in many applications, a...
Rapid advances in data collection and processing capabilities have allowed for the use of increasing...
Optimization is the key component of deep learning. Increasing depth, which is vital for reaching a...
Gradient-following learning methods can encounter problems of implementation in many applications, ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...