The effective noise of Stochastic Gradient Descent

Mignacco, Francesca
Urbani, Pierfrancesco

Publication date

June 2022

Abstract

Stochastic Gradient Descent (SGD) is the workhorse algorithm of deep learning technology. At each step of the training phase, a mini batch of samples is drawn from the training dataset and the weights of the neural network are adjusted according to the performance on this specific subset of examples. The mini-batch sampling procedure introduces a stochastic dynamics to the gradient descent, with a non-trivial state-dependent noise. We characterize the stochasticity of SGD and a recently-introduced variant, \emph{persistent} SGD, in a prototypical neural network model. In the under-parametrized regime, where the final training error is positive, the SGD dynamics reaches a stationary state and we define an effective temperature from the fluct...

Extracted data

We use cookies to provide a better user experience.

Data Protection

The effective noise of Stochastic Gradient Descent

Abstract

Extracted data

The effective noise of Stochastic Gradient Descent

Abstract

Extracted data

Related items

Related items