A new loss function is proposed which learns the hinge loss function an infinite number of times pushing $f(x_i)y_i \to \infty$. It is proven that for a linear model on linearly separable data this modified hinge loss function converges in the direction of the $\ell_2$ max-margin separator at a rate of $\bigO\left( \sqrt{d/t} \right)$ where $d$ is the dimension of the data. Then, an explicit formula for the underlying dynamical system of the gradient descent iterates for two-layer linear networks on the inner product loss function is derived. Using the derived dynamical system, a precise explicit algorithm is developed which when implemented reproduces the gradient descent iterates of two-layer ReLU nets on the inner product exactly. Thi...
Normalized gradient descent has shown substantial success in speeding up the convergence of exponen...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Given the training examples {xi, yi}, the squared Hinge loss is written as: J = n∑ i=1 max(0, 1 − yi...
The deep learning optimization community has observed how the neural networks generalization ability...
We study the optimization landscape of deep linear neural networks with the square loss. It is known...
16 pages, 6 figuresInternational audienceNeural networks have been shown to perform incredibly well ...
Despite the fact that the loss functions of deep neural networks are highly non-convex,gradient-base...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
We study on-line generalized linear regression with multidimensional outputs, i.e., neural networks ...
Incorporating higher-order optimization functions, such as Levenberg-Marquardt (LM) have revealed be...
Under mild assumptions, we investigate the structure of loss landscape of two-layer neural networks ...
Despite the success of Lipschitz regularization in stabilizing GAN training, the exact reason of its...
Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and ...
Despite the fact that the loss functions of deep neural networks are highly nonconvex, gradient-base...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Normalized gradient descent has shown substantial success in speeding up the convergence of exponen...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Given the training examples {xi, yi}, the squared Hinge loss is written as: J = n∑ i=1 max(0, 1 − yi...
The deep learning optimization community has observed how the neural networks generalization ability...
We study the optimization landscape of deep linear neural networks with the square loss. It is known...
16 pages, 6 figuresInternational audienceNeural networks have been shown to perform incredibly well ...
Despite the fact that the loss functions of deep neural networks are highly non-convex,gradient-base...
In the past decade, neural networks have demonstrated impressive performance in supervised learning....
We study on-line generalized linear regression with multidimensional outputs, i.e., neural networks ...
Incorporating higher-order optimization functions, such as Levenberg-Marquardt (LM) have revealed be...
Under mild assumptions, we investigate the structure of loss landscape of two-layer neural networks ...
Despite the success of Lipschitz regularization in stabilizing GAN training, the exact reason of its...
Policy optimization is a fundamental principle for designing reinforcement learning algorithms, and ...
Despite the fact that the loss functions of deep neural networks are highly nonconvex, gradient-base...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Normalized gradient descent has shown substantial success in speeding up the convergence of exponen...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Given the training examples {xi, yi}, the squared Hinge loss is written as: J = n∑ i=1 max(0, 1 − yi...