We propose a metric for evaluating the generalization ability of deep neural networks trained with mini-batch gradient descent. Our metric, called gradient disparity, is the l2 norm distance between the gradient vectors of two mini-batches drawn from the training set. It is derived from a probabilistic upper bound on the difference between the classification errors over a given mini-batch, when the network is trained on this mini-batch and when the network is trained on another mini-batch of points sampled from the same dataset. We empirically show that gradient disparity is a very promising early-stopping criterion (i) when data is limited, as it uses all the samples for training and (ii) when available data has noisy labels, as it signals...
We empirically show that the test error of deep networks can be estimated by simply training the sam...
Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tun...
During minibatch gradient-based optimization, the contribution of observations to the updating of th...
Recent advancements in the field of deep learning have dramatically improved the performance of mach...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep n...
In this work, we propose to progressively increase the training difficulty during learning a neural ...
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, e...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
This paper shows that if a large neural network is used for a pattern classification problem, and th...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
Sample complexity results from computational learning theory, when applied to neural network learnin...
This paper discovers that the neural network with lower decision boundary (DB) variability has bette...
As deep learning has become solution for various machine learning, artificial intelligence applicati...
Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that itera...
We empirically show that the test error of deep networks can be estimated by simply training the sam...
Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tun...
During minibatch gradient-based optimization, the contribution of observations to the updating of th...
Recent advancements in the field of deep learning have dramatically improved the performance of mach...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of deep n...
In this work, we propose to progressively increase the training difficulty during learning a neural ...
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, e...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
This paper shows that if a large neural network is used for a pattern classification problem, and th...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
Sample complexity results from computational learning theory, when applied to neural network learnin...
This paper discovers that the neural network with lower decision boundary (DB) variability has bette...
As deep learning has become solution for various machine learning, artificial intelligence applicati...
Deep learning networks are typically trained by Stochastic Gradient Descent (SGD) methods that itera...
We empirically show that the test error of deep networks can be estimated by simply training the sam...
Large-batch SGD is important for scaling training of deep neural networks. However, without fine-tun...
During minibatch gradient-based optimization, the contribution of observations to the updating of th...