We study generalization properties of random features (RF) regression in high dimensions optimized by stochastic gradient descent (SGD) in under-/overparameterized regime. In this work, we derive precise non-asymptotic error bounds of RF regression under both constant and polynomial-decay step-size SGD setting, and observe the double descent phenomenon both theoretically and empirically. Our analysis shows how to cope with multiple randomness sources of initialization, label noise, and data sampling (as well as stochastic gradients) with no closedform solution, and also goes beyond the commonly-used Gaussian/spherical data assumption. Our theoretical results demonstrate that, with SGD training, RF regression still generalizes well for inter...
Stochastic mirror descent (SMD) algorithms have recently garnered a great deal of attention in optim...
We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD),...
Online learning algorithms require to often recompute least squares regression estimates of paramete...
We study generalization properties of random features (RF) regression in high dimensions optimized b...
We study generalised linear regression and classification for a synthetically generated dataset enco...
see also: 2020 Generalisation error in learning with random features and the hidden manifold model, ...
Sketching and stochastic gradient methods are arguably the most common techniques to derive efficien...
We prove a non-asymptotic distribution-independent lower bound for the expected mean squared general...
Recent theoretical studies illustrated that kernel ridgeless regression can guarantee good generaliz...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
We study the convergence, the implicit regularization and the generalization of stochastic mirror de...
We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of...
Stochastic descent methods (of the gradient and mirror varieties) have become increasingly popular i...
We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" lear...
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in...
Stochastic mirror descent (SMD) algorithms have recently garnered a great deal of attention in optim...
We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD),...
Online learning algorithms require to often recompute least squares regression estimates of paramete...
We study generalization properties of random features (RF) regression in high dimensions optimized b...
We study generalised linear regression and classification for a synthetically generated dataset enco...
see also: 2020 Generalisation error in learning with random features and the hidden manifold model, ...
Sketching and stochastic gradient methods are arguably the most common techniques to derive efficien...
We prove a non-asymptotic distribution-independent lower bound for the expected mean squared general...
Recent theoretical studies illustrated that kernel ridgeless regression can guarantee good generaliz...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
We study the convergence, the implicit regularization and the generalization of stochastic mirror de...
We develop a stochastic differential equation, called homogenized SGD, for analyzing the dynamics of...
Stochastic descent methods (of the gradient and mirror varieties) have become increasingly popular i...
We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" lear...
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in...
Stochastic mirror descent (SMD) algorithms have recently garnered a great deal of attention in optim...
We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD),...
Online learning algorithms require to often recompute least squares regression estimates of paramete...