In Neural Information Processing Systems (NeurIPS), Spotlight Presentation, 2023A recent line of empirical studies has demonstrated that SGD might exhibit a heavy-tailed behavior in practical settings, and the heaviness of the tails might correlate with the overall performance. In this paper, we investigate the emergence of such heavy tails. Previous works on this problem only considered, up to our knowledge, online (also called single-pass) SGD, in which the emergence of heavy tails in theoretical findings is contingent upon access to an infinite amount of data. Hence, the underlying mechanism generating the reported heavy-tailed behavior in practical settings, where the amount of training data is finite, is still not well-understood. Our ...
In this thesis, we are concerned with the Stochastic Gradient Descent (SGD) algorithm. Specifically,...
Abstract. Stochastic-approximation gradient methods are attractive for large-scale convex optimizati...
International audienceDeep neural networks achieve stellar generalisation even when they have enough...
In Neural Information Processing Systems (NeurIPS), Spotlight Presentation, 2023A recent line of emp...
Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to ...
International audienceStochastic gradient descent (SGD) has been widely used in machine learning due...
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak'...
International audienceRecent studies have provided both empirical and theoretical evidence illustrat...
The deep learning optimization community has observed how the neural networks generalization ability...
The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its prop...
International audienceGaussian noise injections (GNIs) are a family of simple and widely-used regula...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used extensively to train artif...
We study stochastic algorithms in a streaming framework, trained on samples coming from a dependent ...
Thesis (Ph.D.)--University of Washington, 2019Tremendous advances in large scale machine learning an...
In this thesis, we are concerned with the Stochastic Gradient Descent (SGD) algorithm. Specifically,...
Abstract. Stochastic-approximation gradient methods are attractive for large-scale convex optimizati...
International audienceDeep neural networks achieve stellar generalisation even when they have enough...
In Neural Information Processing Systems (NeurIPS), Spotlight Presentation, 2023A recent line of emp...
Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to ...
International audienceStochastic gradient descent (SGD) has been widely used in machine learning due...
The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak'...
International audienceRecent studies have provided both empirical and theoretical evidence illustrat...
The deep learning optimization community has observed how the neural networks generalization ability...
The gradient noise of Stochastic Gradient Descent (SGD) is considered to play a key role in its prop...
International audienceGaussian noise injections (GNIs) are a family of simple and widely-used regula...
We introduce a general framework for nonlinear stochastic gradient descent (SGD) for the scenarios w...
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used extensively to train artif...
We study stochastic algorithms in a streaming framework, trained on samples coming from a dependent ...
Thesis (Ph.D.)--University of Washington, 2019Tremendous advances in large scale machine learning an...
In this thesis, we are concerned with the Stochastic Gradient Descent (SGD) algorithm. Specifically,...
Abstract. Stochastic-approximation gradient methods are attractive for large-scale convex optimizati...
International audienceDeep neural networks achieve stellar generalisation even when they have enough...