The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance σ2/b of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms of test and training accuracy and in terms of tra...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
This paper focuses on the enhancement of the generalization ability and training stability of deep n...
Network training algorithms have heavily concentrated on the learning of connection weights. Little ...
The weight initialization and the activation function of deep neural networks have a crucial impact ...
Learning with neural networks depends on the particular parametrization of the functions represented...
© 2017 IEEE. Deep Belief Network (DBN) is made up of stacked Restricted Boltzmann Machine layers ass...
Training a neural network (NN) depends on multiple factors, including but not limited to the initial...
MEng (Computer and Electronic Engineering), North-West University, Potchefstroom CampusThe ability o...
The vanishing gradient problem (i.e., gradients prematurely becoming extremely small during training...
The vanishing gradient problem (i.e., gradients prematurely becoming extremely small during trainin...
A new method of initializing the weights in deep neural networks is proposed. The method follows two...
Researchers have proposed various activation functions. These activation functions help the deep net...
Deep feedforward neural networks with piecewise linear activations are currently producing the state...
Deep neural networks have had tremendous success in a wide range of applications where they achieve ...
The study of feature propagation at initialization in neural networks lies at the root of numerous i...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
This paper focuses on the enhancement of the generalization ability and training stability of deep n...
Network training algorithms have heavily concentrated on the learning of connection weights. Little ...
The weight initialization and the activation function of deep neural networks have a crucial impact ...
Learning with neural networks depends on the particular parametrization of the functions represented...
© 2017 IEEE. Deep Belief Network (DBN) is made up of stacked Restricted Boltzmann Machine layers ass...
Training a neural network (NN) depends on multiple factors, including but not limited to the initial...
MEng (Computer and Electronic Engineering), North-West University, Potchefstroom CampusThe ability o...
The vanishing gradient problem (i.e., gradients prematurely becoming extremely small during training...
The vanishing gradient problem (i.e., gradients prematurely becoming extremely small during trainin...
A new method of initializing the weights in deep neural networks is proposed. The method follows two...
Researchers have proposed various activation functions. These activation functions help the deep net...
Deep feedforward neural networks with piecewise linear activations are currently producing the state...
Deep neural networks have had tremendous success in a wide range of applications where they achieve ...
The study of feature propagation at initialization in neural networks lies at the root of numerous i...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
This paper focuses on the enhancement of the generalization ability and training stability of deep n...
Network training algorithms have heavily concentrated on the learning of connection weights. Little ...