Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster conver...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
Autoencoders are the simplest neural network for unsupervised learning, and thus an ideal framework ...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Learning in deep neural networks is known to depend critically on the knowledge embedded in the init...
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously ap...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and s...
A prominent feature of modern Artificial \nn\ classifiers is the nonlinear aspects of neural computa...
We show that information theoretic quantities can be used to control and describe the training proce...
A prominent feature of modern Artificial \nn\ classifiers is the nonlinear aspects of neural computa...
While the empirical success of self-supervised learning (SSL) heavily relies on the usage of deep no...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
The weight initialization and the activation function of deep neural networks have a crucial impact ...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
Autoencoders are the simplest neural network for unsupervised learning, and thus an ideal framework ...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Learning in deep neural networks is known to depend critically on the knowledge embedded in the init...
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously ap...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and s...
A prominent feature of modern Artificial \nn\ classifiers is the nonlinear aspects of neural computa...
We show that information theoretic quantities can be used to control and describe the training proce...
A prominent feature of modern Artificial \nn\ classifiers is the nonlinear aspects of neural computa...
While the empirical success of self-supervised learning (SSL) heavily relies on the usage of deep no...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
In the recent decade, deep neural networks have solved ever more complex tasks across many fronts in...
The weight initialization and the activation function of deep neural networks have a crucial impact ...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
Autoencoders are the simplest neural network for unsupervised learning, and thus an ideal framework ...