We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space. By appealing to harmonic analysis we show that all local minima of such network are non-differentiable, except for those minima that occur in a region of parameter space where the loss surface is perfectly flat. Non-differentiable minima are therefore not technicalities or pathologies; they are heart of the problem when investigating the loss of ReLU networks. As a consequence, we must employ techniques from nonsmooth analysis to study these loss surfaces. We show how to apply these techniques in some illustrative cases
Neural networks with the Rectified Linear Unit (ReLU) nonlinearity are described by a vector of para...
We algorithmically determine the regions and facets of all dimensions of the canonical polyhedral co...
Understanding the loss surface of neural networks is essential for the design of models with predict...
© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. We inves...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer an...
We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and e...
We address the following question: How redundant is the parameterisation of ReLU networks? Specific...
We identify tessellation-filtering ReLU neural networks that, when composed with another ReLU netwo...
Understanding the computational complexity of training simple neural networks with rectified linear ...
This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions...
Deep neural networks are the main subject of interest in the study of theoretical deep learning, whi...
We contribute to a better understanding of the class of functions that is represented by a neural ne...
A new loss function is proposed which learns the hinge loss function an infinite number of times pus...
By applying concepts from the statistical physics of learning, we study layered neural networks of r...
Neural networks with the Rectified Linear Unit (ReLU) nonlinearity are described by a vector of para...
We algorithmically determine the regions and facets of all dimensions of the canonical polyhedral co...
Understanding the loss surface of neural networks is essential for the design of models with predict...
© 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. We inves...
Rectified linear units (ReLUs) have become the main model for the neural units in current deep learn...
In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer an...
We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and e...
We address the following question: How redundant is the parameterisation of ReLU networks? Specific...
We identify tessellation-filtering ReLU neural networks that, when composed with another ReLU netwo...
Understanding the computational complexity of training simple neural networks with rectified linear ...
This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions...
Deep neural networks are the main subject of interest in the study of theoretical deep learning, whi...
We contribute to a better understanding of the class of functions that is represented by a neural ne...
A new loss function is proposed which learns the hinge loss function an infinite number of times pus...
By applying concepts from the statistical physics of learning, we study layered neural networks of r...
Neural networks with the Rectified Linear Unit (ReLU) nonlinearity are described by a vector of para...
We algorithmically determine the regions and facets of all dimensions of the canonical polyhedral co...
Understanding the loss surface of neural networks is essential for the design of models with predict...