In empirical risk optimization, it has been observed that gradient descent implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data randomly and independently of each other. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. Some of these justifications rely on loose bounds, or their conclusions are dependent on the sample size which is problematic for large datasets. This work focuses on constant step-size adaptation, where the agent is continuously learning. In this case, convergence is only guaranteed to a small neighborhood of the optimizer albeit at a linear ra...
We consider the minimization of an objective function given access to unbiased estimates of its grad...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods...
In empirical risk optimization, it has been observed that stochastic gradient implementations that r...
This dissertation focuses on stochastic gradient learning for problems involving large data sets or ...
This dissertation focuses on stochastic gradient learning for problems involving large data sets or ...
Abstract We analyze the convergence rate of the random reshuffling (RR) method, which...
We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax opt...
Gradient compression is a popular technique for improving communication complexity of stochastic fir...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
We study the random reshuffling (RR) method for smooth nonconvex optimization problems with a finite...
Abstract: Stochastic gradient descent is an optimisation method that combines classical gradient des...
Random reshuffling, which randomly permutes the dataset each epoch, is widely adopted in model train...
Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Ran...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
We consider the minimization of an objective function given access to unbiased estimates of its grad...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods...
In empirical risk optimization, it has been observed that stochastic gradient implementations that r...
This dissertation focuses on stochastic gradient learning for problems involving large data sets or ...
This dissertation focuses on stochastic gradient learning for problems involving large data sets or ...
Abstract We analyze the convergence rate of the random reshuffling (RR) method, which...
We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax opt...
Gradient compression is a popular technique for improving communication complexity of stochastic fir...
Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models....
We study the random reshuffling (RR) method for smooth nonconvex optimization problems with a finite...
Abstract: Stochastic gradient descent is an optimisation method that combines classical gradient des...
Random reshuffling, which randomly permutes the dataset each epoch, is widely adopted in model train...
Stochastic Gradient Descent (SGD) algorithms are widely used in optimizing neural networks, with Ran...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
We consider the minimization of an objective function given access to unbiased estimates of its grad...
Current machine learning practice requires solving huge-scale empirical risk minimization problems q...
The steplength selection is a crucial issue for the effectiveness of the stochastic gradient methods...