International audienceUnderstanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks. In this paper, we study the role of the label noise in the training dynamics of a quadratically parametrised model through its continuous time version. We explicitly characterise the solution chosen by the stochastic flow and prove that it implicitly solves a Lasso program. To fully complete our analysis, we provide nonasymptotic convergence guarantees for the dynamics as well as conditions for support recovery. We also give experimental results which support our theoretical claims. Our findings highlight the fact that structured noise can induce better generalisation and ...
International audienceGaussian noise injections (GNIs) are a family of simple and widely-used regula...
The study of the implicit regularization induced by gradient-based optimization in deep learning is ...
Neural networks trained via gradient descent with random initialization and without any regularizati...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
NOTE: Text or symbols not renderable in plain ASCII are indicated by [...]. Abstract is included in ...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
International audienceStochastic gradient descent (SGD) has been widely used in machine learning due...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
International audienceIn the context of statistical supervised learning, the noiseless linear model ...
We study the training dynamics of a shallow neural network with quadratic activation functions and q...
International audienceDeep neural networks achieve stellar generalisation even when they have enough...
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in...
International audienceGaussian noise injections (GNIs) are a family of simple and widely-used regula...
The study of the implicit regularization induced by gradient-based optimization in deep learning is ...
Neural networks trained via gradient descent with random initialization and without any regularizati...
Understanding the implicit bias of training algorithms is of crucial importance in order to explain ...
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each st...
NOTE: Text or symbols not renderable in plain ASCII are indicated by [...]. Abstract is included in ...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
International audienceStochastic gradient descent (SGD) has been widely used in machine learning due...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
Large-scale machine learning problems can be reduced to non-convex optimization problems if state-of...
International audienceIn the context of statistical supervised learning, the noiseless linear model ...
We study the training dynamics of a shallow neural network with quadratic activation functions and q...
International audienceDeep neural networks achieve stellar generalisation even when they have enough...
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in...
International audienceGaussian noise injections (GNIs) are a family of simple and widely-used regula...
The study of the implicit regularization induced by gradient-based optimization in deep learning is ...
Neural networks trained via gradient descent with random initialization and without any regularizati...