Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model functi...
Finding the optimal size of deep learning models is very actual and of broad impact, especially in e...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
Deep learning has transformed computer vision, natural language processing, and speech recognition. ...
Deep networks are usually trained and tested in a regime in which the training classification error ...
The Lipschitz constant is an important quantity that arises in analysing the convergence of gradient...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
© 2020, The Author(s). Overparametrized deep networks predict well, despite the lack of an explicit ...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
We investigate robustness of deep feed-forward neural networks when input data are subject to random...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
Modern deep neural networks are highly over-parameterized compared to the data on which they are tra...
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolut...
Finding the optimal size of deep learning models is very actual and of broad impact, especially in e...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
Deep learning has transformed computer vision, natural language processing, and speech recognition. ...
Deep networks are usually trained and tested in a regime in which the training classification error ...
The Lipschitz constant is an important quantity that arises in analysing the convergence of gradient...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
© 2020, The Author(s). Overparametrized deep networks predict well, despite the lack of an explicit ...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
We investigate robustness of deep feed-forward neural networks when input data are subject to random...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
Modern deep neural networks are highly over-parameterized compared to the data on which they are tra...
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolut...
Finding the optimal size of deep learning models is very actual and of broad impact, especially in e...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
Deep learning has transformed computer vision, natural language processing, and speech recognition. ...