The study of feature propagation at initialization in neural networks lies at the root of numerous initialization designs. An assumption very commonly made in the field states that the pre-activations are Gaussian. Although this convenient Gaussian hypothesis can be justified when the number of neurons per layer tends to infinity, it is challenged by both theoretical and experimental works for finite-width neural networks. Our major contribution is to construct a family of pairs of activation functions and initialization distributions that ensure that the pre-activations remain Gaussian throughout the network's depth, even in narrow neural networks. In the process, we discover a set of constraints that a neural network should fulfill to ens...
Training a neural network (NN) depends on multiple factors, including but not limited to the initial...
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, give...
Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pr...
International audienceThe goal of the present work is to propose a way to modify both the initializa...
The activation function deployed in a deep neural network has great influence on the performance of ...
Thesis (Ph.D.)To achieve better prediction performance, much research effort in deep/machine learnin...
The weight initialization and the activation function of deep neural networks have a crucial impact ...
Deep neural networks have had tremendous success in a wide range of applications where they achieve ...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
Thesis (MSc)--Stellenbosch University, 2021.ENGLISH ABSTRACT: Recently, proper initialisation and st...
Recent years have witnessed an increasing interest in the correspondence between infinitely wide net...
Learning with neural networks depends on the particular parametrization of the functions represented...
Understanding the impact of data structure on the computational tractability of learning is a key ch...
To gain a deeper understanding of the behavior and learning dynamics of (deep) artificial neural net...
It is well known that direct training of deep neu-ral networks will generally lead to poor results. ...
Training a neural network (NN) depends on multiple factors, including but not limited to the initial...
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, give...
Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pr...
International audienceThe goal of the present work is to propose a way to modify both the initializa...
The activation function deployed in a deep neural network has great influence on the performance of ...
Thesis (Ph.D.)To achieve better prediction performance, much research effort in deep/machine learnin...
The weight initialization and the activation function of deep neural networks have a crucial impact ...
Deep neural networks have had tremendous success in a wide range of applications where they achieve ...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
Thesis (MSc)--Stellenbosch University, 2021.ENGLISH ABSTRACT: Recently, proper initialisation and st...
Recent years have witnessed an increasing interest in the correspondence between infinitely wide net...
Learning with neural networks depends on the particular parametrization of the functions represented...
Understanding the impact of data structure on the computational tractability of learning is a key ch...
To gain a deeper understanding of the behavior and learning dynamics of (deep) artificial neural net...
It is well known that direct training of deep neu-ral networks will generally lead to poor results. ...
Training a neural network (NN) depends on multiple factors, including but not limited to the initial...
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, give...
Network pruning is a promising avenue for compressing deep neural networks. A typical approach to pr...