In this paper, we study the generalization performance of global minima for implementing empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving almost optimal generalization error bounds for numerous types of data under mild conditions. Since over-parameterization is crucial to guarantee that the global minima of ERM on deep ReLU nets can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results indeed fill a gap between optimization and generalization.Comment: 15 pages, 3 figure
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
© 2017 Neural information processing systems foundation. All rights reserved. Empirical risk minimiz...
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), an...
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) i...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
This theoretical paper is devoted to developing a rigorous theory for demystifying the global conver...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding...
Deep learning has witnessed fast growth and wide application in recent years. One of the essential ...
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, e...
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number...
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolut...
In this work, we construct generalization bounds to understand existing learning algorithms and prop...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
© 2017 Neural information processing systems foundation. All rights reserved. Empirical risk minimiz...
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), an...
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) i...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
This theoretical paper is devoted to developing a rigorous theory for demystifying the global conver...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding...
Deep learning has witnessed fast growth and wide application in recent years. One of the essential ...
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, e...
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number...
In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolut...
In this work, we construct generalization bounds to understand existing learning algorithms and prop...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
© 2017 Neural information processing systems foundation. All rights reserved. Empirical risk minimiz...
Many machine learning tasks can be formulated as Regularized Empirical Risk Minimization (R-ERM), an...