We study the binary and continuous negative-margin perceptrons as simple nonconvex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide ...
We consider the existence of fixed points of nonnegative neural networks, i.e., neural networks that...
The modern strategy for training deep neural networks for classification tasks includes optimizing t...
In this paper the authors describe some useful strategies for nonconvex optimisation in order to det...
The success of deep learning has revealed the application potential of neural networks across the sc...
Current deep neural networks are highly overparameterized (up to billions of connection weights) and...
Solving large scale optimization problems, such as neural networks training, can present many challe...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ...
Understanding the loss surface of neural networks is essential for the design of models with predict...
5+12 pagesInternational audienceCritical jamming transitions are characterized by an astonishing deg...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
The statistical picture of the solution space for a binary perceptron is studied. The binary percept...
We investigate the clipped Hebb rule for learning different multilayer networks of nonoverlapping pe...
Convex $\ell_1$ regularization using an infinite dictionary of neurons has been suggested for constr...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
We consider the existence of fixed points of nonnegative neural networks, i.e., neural networks that...
The modern strategy for training deep neural networks for classification tasks includes optimizing t...
In this paper the authors describe some useful strategies for nonconvex optimisation in order to det...
The success of deep learning has revealed the application potential of neural networks across the sc...
Current deep neural networks are highly overparameterized (up to billions of connection weights) and...
Solving large scale optimization problems, such as neural networks training, can present many challe...
Learning in deep neural networks takes place by minimizing a nonconvex high-dimensional loss functio...
The geometrical features of the (non-convex) loss landscape of neural network models are crucial in ...
Understanding the loss surface of neural networks is essential for the design of models with predict...
5+12 pagesInternational audienceCritical jamming transitions are characterized by an astonishing deg...
© 2019 Massachusetts Institute of Technology. For nonconvex optimization in machine learning, this a...
The statistical picture of the solution space for a binary perceptron is studied. The binary percept...
We investigate the clipped Hebb rule for learning different multilayer networks of nonoverlapping pe...
Convex $\ell_1$ regularization using an infinite dictionary of neurons has been suggested for constr...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
We consider the existence of fixed points of nonnegative neural networks, i.e., neural networks that...
The modern strategy for training deep neural networks for classification tasks includes optimizing t...
In this paper the authors describe some useful strategies for nonconvex optimisation in order to det...