© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the literature indicate that a residual network (ResNet) composed of a single residual block outperforms linear predictors, in the sense that all local minima in its optimization landscape are at least as good as the best linear predictor. However, these results are limited to a single residual block (i.e., shallow ResNets), instead of the deep ResNets composed of multiple residual blocks. We take a step towards extending this result to deep ResNets. We start by two motivating examples. First, we show that there exist datasets for which all local minima of a fully-connected ReLU network are no better than the best linear predictor, whereas a ResN...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) i...
This paper first constructs a typical solution of ResNets for multi-category classifications by the ...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
Deep learning has become an important toolkit for data science and artificial intelligence. In contr...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
We develop new theoretical results on matrix perturbation to shed light on the impact of architectur...
Overparametrization is a key factor in the absence of convexity to explain global convergence of gra...
Accurate neural networks can be found just by pruning a randomly initialized overparameterized model...
Residual networks (ResNets) have significantly better trainability and thus performance than feed-fo...
Various powerful deep neural network architectures have made great contribution to the exciting succ...
© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. We inves...
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
Recurrent networks are trained to memorize their input better, often in the hopes that such training...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) i...
This paper first constructs a typical solution of ResNets for multi-category classifications by the ...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
Deep learning has become an important toolkit for data science and artificial intelligence. In contr...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
We develop new theoretical results on matrix perturbation to shed light on the impact of architectur...
Overparametrization is a key factor in the absence of convexity to explain global convergence of gra...
Accurate neural networks can be found just by pruning a randomly initialized overparameterized model...
Residual networks (ResNets) have significantly better trainability and thus performance than feed-fo...
Various powerful deep neural network architectures have made great contribution to the exciting succ...
© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. We inves...
Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
Recurrent networks are trained to memorize their input better, often in the hopes that such training...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) i...
This paper first constructs a typical solution of ResNets for multi-category classifications by the ...