Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural networks due to the famous result by Jacot et al. (2018): in the infinite-width limit, the NTK is deterministic and constant during training. However, this result cannot explain the behavior of deep networks, since it generally does not hold if depth and width tend to infinity simultaneously. In this paper, we study the NTK of fully-connected ReLU networks with depth comparable to width. We prove that the NTK properties depend significantly on the depth-to-width ratio and the distribution of parameters at initialization. In fact, our results indicate the importance of the three phases in the hyperparameter space identified in Poole et al. (2016): ordered, chaotic ...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's repre...
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously ap...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Recently, neural tangent kernel (NTK) has been used to explain the dynamics of learning parameters o...
The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and s...
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset s...
A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural...
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
This thesis aims to study recent theoretical work in machine learning research that seeks to better ...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's repre...
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously ap...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Recently, neural tangent kernel (NTK) has been used to explain the dynamics of learning parameters o...
The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and s...
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
How well does a classic deep net architecture like AlexNet or VGG19 classify on a standard dataset s...
A recent line of work has analyzed the theoretical properties of deep neural networks via the Neural...
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
This thesis aims to study recent theoretical work in machine learning research that seeks to better ...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
Deep feedforward networks initialized along the edge of chaos exhibit exponentially superior trainin...
Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's repre...