Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled as the network size grows) have led to scaling limits described by differential equations. However, these results do not a priori tell us anything about "ordinary" unshaped networks, where the activation is unchanged as the network size grows. In this article, we find similar differential equation based asymptotic characterization for two types of unshaped networks. Firstly, we show that the following two architectures converge to the same infinite-depth-and-width limit at initialization: (i) a fully connected ResNet with a $d^{-1/2}$ factor on the residual branch, where $d$ is the network depth. (ii) a multilayer perceptron (MLP) with depth...
International audienceThis paper underlines a subtle property of batch-normalization (BN): Successiv...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers...
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, give...
We develop a mathematically rigorous framework for multilayer neural networks in the mean field regi...
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks...
We contribute to a better understanding of the class of functions that can be represented by a neura...
In this note, we study how neural networks with a single hidden layer and ReLU activation interpolat...
We study the effect of normalization on the layers of deep neural networks of feed-forward type. A g...
We establish in this work approximation results of deep neural networks for smooth functions measure...
It took until the last decade to finally see a machine match human performance on essentially any ta...
In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy ...
Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN t...
Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound...
International audienceThis paper underlines a subtle property of batch-normalization (BN): Successiv...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers...
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, give...
We develop a mathematically rigorous framework for multilayer neural networks in the mean field regi...
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks...
We contribute to a better understanding of the class of functions that can be represented by a neura...
In this note, we study how neural networks with a single hidden layer and ReLU activation interpolat...
We study the effect of normalization on the layers of deep neural networks of feed-forward type. A g...
We establish in this work approximation results of deep neural networks for smooth functions measure...
It took until the last decade to finally see a machine match human performance on essentially any ta...
In supervised learning, the regularization path is sometimes used as a convenient theoretical proxy ...
Deep neural networks (DNNs) defy the classical bias-variance trade-off: adding parameters to a DNN t...
Given any deep fully connected neural network, initialized with random Gaussian parameters, we bound...
International audienceThis paper underlines a subtle property of batch-normalization (BN): Successiv...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...