International audienceGaussian noise injections (GNIs) are a family of simple and widely-used regularisation methods for training neural networks, where one injects additive or multiplicative Gaussian noise to the network activations at every iteration of the optimisation algorithm, which is typically chosen as stochastic gradient descent (SGD). In this paper we focus on the so-called `implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of SGD. We show that this effect induces an asymmetric heavy-tailed noise on SGD gradient updates. In order to model this modified dynamics, we first develop a Langevin-like stochastic differential equation that is driven by a general family of asymmetric heavy-tailed noise. U...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
The gradient noise of SGD is considered to play a central role in the observed strong generalization...
Deep Learning (read neural networks) has emerged as one of the most exciting and powerful tools in t...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in...
Noise Injection consists in adding noise to the inputs during neural network training. Experimental ...
We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight d...
Modern neural networks can easily fit their training set perfectly. Surprisingly, they generalize we...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
International audienceUnderstanding the implicit bias of training algorithms is of crucial importanc...
In this thesis, we are concerned with the Stochastic Gradient Descent (SGD) algorithm. Specifically,...
International audienceInjecting artificial noise into gradient descent (GD) is commonly employed to ...
Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
In Neural Information Processing Systems (NeurIPS), Spotlight Presentation, 2023A recent line of emp...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
The gradient noise of SGD is considered to play a central role in the observed strong generalization...
Deep Learning (read neural networks) has emerged as one of the most exciting and powerful tools in t...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in...
Noise Injection consists in adding noise to the inputs during neural network training. Experimental ...
We analyze deep ReLU neural networks trained with mini-batch stochastic gradient decent and weight d...
Modern neural networks can easily fit their training set perfectly. Surprisingly, they generalize we...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
International audienceUnderstanding the implicit bias of training algorithms is of crucial importanc...
In this thesis, we are concerned with the Stochastic Gradient Descent (SGD) algorithm. Specifically,...
International audienceInjecting artificial noise into gradient descent (GD) is commonly employed to ...
Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
In Neural Information Processing Systems (NeurIPS), Spotlight Presentation, 2023A recent line of emp...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
The gradient noise of SGD is considered to play a central role in the observed strong generalization...