Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simulta...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based met...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
Modern neural networks often have great expressive power and can be trained to overfit the training ...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Modern machine learning often operates in the regime where the number of parameters is much higher t...
The recent success of neural network models has shone light on a rather surprising statistical pheno...
The literature on "benign overfitting" in overparameterized models has been mostly restricted to reg...
The phenomenon of benign overfitting, where a predictor perfectly fits noisy training data while att...
Large neural networks have proved remarkably effective in modern deep learning practice, even in the...
Increasing the size of overparameterized neural networks has been shown to improve their generalizat...
The remarkable practical success of deep learning has revealed some major surprises from a theoretic...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
We investigate two causes for adversarial vulnerability in deep neural networks: bad data and (poorl...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based met...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
Modern neural networks often have great expressive power and can be trained to overfit the training ...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
Modern machine learning often operates in the regime where the number of parameters is much higher t...
The recent success of neural network models has shone light on a rather surprising statistical pheno...
The literature on "benign overfitting" in overparameterized models has been mostly restricted to reg...
The phenomenon of benign overfitting, where a predictor perfectly fits noisy training data while att...
Large neural networks have proved remarkably effective in modern deep learning practice, even in the...
Increasing the size of overparameterized neural networks has been shown to improve their generalizat...
The remarkable practical success of deep learning has revealed some major surprises from a theoretic...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
We investigate two causes for adversarial vulnerability in deep neural networks: bad data and (poorl...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
In this paper, we propose a geometric framework to analyze the convergence properties of gradient de...
Neural networks trained to minimize the logistic (a.k.a. cross-entropy) loss with gradient-based met...
The general features of the optimization problem for the case of overparametrized nonlinear networks...