Substantial work indicates that the dynamics of neural networks (NNs) is closely related to their initialization of parameters. Inspired by the phase diagram for two-layer ReLU NNs with infinite width (Luo et al., 2021), we make a step towards drawing a phase diagram for three-layer ReLU NNs with infinite width. First, we derive a normalized gradient flow for three-layer ReLU NNs and obtain two key independent quantities to distinguish different dynamical regimes for common initialization methods. With carefully designed experiments and a large computation cost, for both synthetic datasets and real datasets, we find that the dynamics of each layer also could be divided into a linear regime and a condensed regime, separated by a critical reg...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
We train a neural network model to predict the full phase space evolution of cosmological N-body sim...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
Empirical works show that for ReLU neural networks (NNs) with small initialization, input weights of...
We investigate layered neural networks with differentiable activation function and student vectors w...
This work reports deep-learning-unique first-order and second-order phase transitions, whose phenome...
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achiev...
We analyze feature learning in infinite-width neural networks trained with gradient flow through a s...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
: We report numerical studies of the "memory-loss" phase transition in Hopfield-like symme...
Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural networks due to the fa...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
The columns correspond to the quantities fλ>0 (left), Tav (middle) and ρrms (right), as defined in t...
Abstract: We report numerical studies of the "memory-loss " phase transition in Hopfield-...
Understanding capabilities and limitations of different network architectures is of fundamental impo...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
We train a neural network model to predict the full phase space evolution of cosmological N-body sim...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
Empirical works show that for ReLU neural networks (NNs) with small initialization, input weights of...
We investigate layered neural networks with differentiable activation function and student vectors w...
This work reports deep-learning-unique first-order and second-order phase transitions, whose phenome...
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achiev...
We analyze feature learning in infinite-width neural networks trained with gradient flow through a s...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
: We report numerical studies of the "memory-loss" phase transition in Hopfield-like symme...
Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural networks due to the fa...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
The columns correspond to the quantities fλ>0 (left), Tav (middle) and ρrms (right), as defined in t...
Abstract: We report numerical studies of the "memory-loss " phase transition in Hopfield-...
Understanding capabilities and limitations of different network architectures is of fundamental impo...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
We train a neural network model to predict the full phase space evolution of cosmological N-body sim...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...