On SGD with Momentum

Plattner, Maximilian

Publication date

July 2022

Abstract

Stochastic Gradient Descent (SGD) is the workhorse for training large-scale machine learning applications. Although the convergence rate of its deterministic counterpart, Gradient Descent (GD), can be shown to be accelerated by adaptations that use the notion of momentum, e.g., Heavy Ball (HB) or Nesterov Accelerated Gradient (NAG), the theory could not prove, by means of local convergence analysis, that such modifications provide faster convergence rates in the stochastic setting. This work empirically establishes that a positive momentum coefficient in SGD has the effect of enlarging the algorithm's learning rate, not contributing to a boost in performance per se. For the deep learning setting, however, this enlargement tends to be conduc...

Extracted data

We use cookies to provide a better user experience.

Data Protection

On SGD with Momentum

Abstract

Extracted data

On SGD with Momentum

Abstract

Extracted data

Related items

Related items