International audienceIn theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU'(0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN, ImageNet). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU'(0) = 0 se...
In the article, emphasis is put on the modern artificial neural network structure, which in the lite...
For many reasons, neural networks have become very popular AI machine learning models. Two of the mo...
© 2019 Neural information processing systems foundation. All rights reserved. We study finite sample...
International audienceIn theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligib...
In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on b...
For most state-of-the-art architectures, Rectified Linear Unit (ReLU) becomes a standard component a...
Motivated by the goal of enabling energy-efficient and/or lower-cost hardware implementations of dee...
By applying concepts from the statistical physics of learning, we study layered neural networks of r...
Activation function is a key component in deep learning that performs non-linear mappings between th...
Backpropagation learning algorithms typically collapse the network's structure into a single ve...
Classifiers used in the wild, in particular for safety-critical systems, should not only have good g...
We study the training of deep neural networks by gradient descent where floating-point arithmetic is...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
QActivation function is a key component in deep learning that performs non-linear mappings between t...
Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and c...
In the article, emphasis is put on the modern artificial neural network structure, which in the lite...
For many reasons, neural networks have become very popular AI machine learning models. Two of the mo...
© 2019 Neural information processing systems foundation. All rights reserved. We study finite sample...
International audienceIn theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligib...
In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on b...
For most state-of-the-art architectures, Rectified Linear Unit (ReLU) becomes a standard component a...
Motivated by the goal of enabling energy-efficient and/or lower-cost hardware implementations of dee...
By applying concepts from the statistical physics of learning, we study layered neural networks of r...
Activation function is a key component in deep learning that performs non-linear mappings between th...
Backpropagation learning algorithms typically collapse the network's structure into a single ve...
Classifiers used in the wild, in particular for safety-critical systems, should not only have good g...
We study the training of deep neural networks by gradient descent where floating-point arithmetic is...
The success of deep learning has shown impressive empirical breakthroughs, but many theoretical ques...
QActivation function is a key component in deep learning that performs non-linear mappings between t...
Deep Learning (DL) networks used in image segmentation tasks must be trained with input images and c...
In the article, emphasis is put on the modern artificial neural network structure, which in the lite...
For many reasons, neural networks have become very popular AI machine learning models. Two of the mo...
© 2019 Neural information processing systems foundation. All rights reserved. We study finite sample...