We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima
Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimiza...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neura...
In this paper, we prove a conjecture published in 1989 and also partially address an open problem an...
© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. We inves...
Understanding the loss surface of neural networks is essential for the design of models with predict...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
We study the optimization landscape of deep linear neural networks with the square loss. It is known...
The success of deep learning has revealed the application potential of neural networks across the sc...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
This work characterizes the effect of depth on the optimization landscape of linear regression, show...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Training deep neural networks is a well-known highly non-convex problem. In recent works, it is show...
Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimiza...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neura...
In this paper, we prove a conjecture published in 1989 and also partially address an open problem an...
© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. We inves...
Understanding the loss surface of neural networks is essential for the design of models with predict...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
We study the optimization landscape of deep linear neural networks with the square loss. It is known...
The success of deep learning has revealed the application potential of neural networks across the sc...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
This work characterizes the effect of depth on the optimization landscape of linear regression, show...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Training deep neural networks is a well-known highly non-convex problem. In recent works, it is show...
Conventional wisdom states that deep linear neural networks benefit from expressiveness and optimiza...
© 2019 Neural information processing systems foundation. All rights reserved. Recent results in the ...
In the last decade or so, deep learning has revolutionized entire domains of machine learning. Neura...