Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth L increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor α_L. We show in a probabilistic setting that with standard i.i.d. initializations, the only non-trivial dynamics is for α_L = 1 / √ L (other choices lead either to explosion or to identity mapping). This scaling factor corresponds in the continuous-time limit to a neural stochastic dif...
Deep learning algorithms are responsible for a technological revolution in a variety oftasks includi...
Residual deep neural networks (ResNets) are mathematically described as interacting particle systems...
We show that information theoretic quantities can be used to control and describe the training proce...
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks...
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers...
International audienceRandomly initialized neural networks are known to become harder to train with ...
It took until the last decade to finally see a machine match human performance on essentially any ta...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
Deep learning has become an important toolkit for data science and artificial intelligence. In contr...
International audienceThis paper underlines a subtle property of batch-normalization (BN): Successiv...
Residual networks (ResNets) have significantly better trainability and thus performance than feed-fo...
Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled a...
Deep learning algorithms are responsible for a technological revolution in a variety oftasks includi...
Residual deep neural networks (ResNets) are mathematically described as interacting particle systems...
We show that information theoretic quantities can be used to control and describe the training proce...
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks...
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers...
International audienceRandomly initialized neural networks are known to become harder to train with ...
It took until the last decade to finally see a machine match human performance on essentially any ta...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
Deep learning has become an important toolkit for data science and artificial intelligence. In contr...
International audienceThis paper underlines a subtle property of batch-normalization (BN): Successiv...
Residual networks (ResNets) have significantly better trainability and thus performance than feed-fo...
Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled a...
Deep learning algorithms are responsible for a technological revolution in a variety oftasks includi...
Residual deep neural networks (ResNets) are mathematically described as interacting particle systems...
We show that information theoretic quantities can be used to control and describe the training proce...