Regularization plays an important role in generalization of deep learning. In this paper, we study the generalization power of an unbiased regularizor for training algorithms in deep learning. We focus on training methods called Locally Regularized Stochastic Gradient Descent (LRSGD). An LRSGD leverages a proximal type penalty in gradient descent steps to regularize SGD in training. We show that by carefully choosing relevant parameters, LRSGD generalizes better than SGD. Our thorough theoretical analysis is supported by experimental evidence. It advances our theoretical understanding of deep learning and provides new perspectives on designing training algorithms. The code is available at https://github.com/huiqu18/LRSGD
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
We study the effect of regularization in an on-line gradient-descent learning scenario for a general...
We extend Deep Deterministic Policy Gradient, a state of the art algorithm for continuous control, i...
Regularization plays an important role in generalization of deep learning. In this paper, we study t...
In this work, we construct generalization bounds to understand existing learning algorithms and prop...
Abstract: It is well known that the generalization capability is one of the most important criterion...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
We define notions of stability for learning algorithms and show how to use these notions to derive g...
This thesis reports on experiments aimed at explaining why machine learning algorithms using the gre...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
22 pages, 6 figuresInternational audienceDespite the ubiquitous use of stochastic optimization algor...
In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a ...
Within a statistical learning setting, we propose and study an iterative regularization algorithm fo...
A central problem in statistical learning is to design prediction algorithms that not only perform w...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
We study the effect of regularization in an on-line gradient-descent learning scenario for a general...
We extend Deep Deterministic Policy Gradient, a state of the art algorithm for continuous control, i...
Regularization plays an important role in generalization of deep learning. In this paper, we study t...
In this work, we construct generalization bounds to understand existing learning algorithms and prop...
Abstract: It is well known that the generalization capability is one of the most important criterion...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
We define notions of stability for learning algorithms and show how to use these notions to derive g...
This thesis reports on experiments aimed at explaining why machine learning algorithms using the gre...
[previously titled "Theory of Deep Learning III: Generalization Properties of SGD"] In Theory III we...
22 pages, 6 figuresInternational audienceDespite the ubiquitous use of stochastic optimization algor...
In this thesis we aim to improve generalisation in deep reinforcement learning. Generalisation is a ...
Within a statistical learning setting, we propose and study an iterative regularization algorithm fo...
A central problem in statistical learning is to design prediction algorithms that not only perform w...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...
Injecting noise within gradient descent has several desirable features. In this paper, we explore no...
We study the effect of regularization in an on-line gradient-descent learning scenario for a general...
We extend Deep Deterministic Policy Gradient, a state of the art algorithm for continuous control, i...