The optimization of multilayer neural networks typically leads to a solution with zero training error, yet the landscape can exhibit spurious local minima and the minima can be disconnected. In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization. More specifically, we prove that SGD solutions are connected via a piecewise linear path, and the increase in loss along this path vanishes as the number of neurons grows large. This result is a consequence of the fact that the parameters found by SGD are increasingly dropout stable as the network beco...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
The question of how and why the phenomenon of mode connectivity occurs in training deep neural netwo...
Abstract We investigate the structure of the loss function landscape for neural netwo...
The deep learning optimization community has observed how the neural networks generalization ability...
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak'...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Under mild assumptions, we investigate the structure of loss landscape of two-layer neural networks ...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Abstract: We investigate the structure of the loss function landscape for neural networks subject to...
Abstract: We investigate the structure of the loss function landscape for neural networks subject to...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
The question of how and why the phenomenon of mode connectivity occurs in training deep neural netwo...
Abstract We investigate the structure of the loss function landscape for neural netwo...
The deep learning optimization community has observed how the neural networks generalization ability...
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak'...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Under mild assumptions, we investigate the structure of loss landscape of two-layer neural networks ...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
Abstract: We investigate the structure of the loss function landscape for neural networks subject to...
Abstract: We investigate the structure of the loss function landscape for neural networks subject to...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...