Modern machine learning models, particularly those used in deep networks, are characterized by massive numbers of parameters trained on large datasets. While these large-scale learning methods have had tremendous practical successes, developing theoretical means that can rigorously explain when and why these models work has been an outstanding issue in the field. This dissertation provides a theoretical basis for the understanding of learning dynamics and generalization in high-dimensional regimes. It brings together two important tools that offer the potential for a rigorous analytic understanding of modern problems: statistics of high-dimensional random systems and neural tangent kernels. These frameworks enable the precise characterizat...
Empirical observation of high dimensional phenomena, such as the double descent behavior, has attrac...
International audienceThis article proposes an original approach to the performance understanding of...
Understanding implicit bias of gradient descent for generalization capability of ReLU networks has b...
Recently people showed that wide neural networks can be approximated by linear models under gradient...
In this chapter, we describe the basic concepts behind the functioning of recurrent neural networks ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
International audienceUnderstanding the learning dynamics of neural networks is one of the key issue...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Neural networks (NNs) have seen a surge in popularity due to their unprecedented practical success i...
Neural networks trained via gradient descent with random initialization and without any regularizati...
We study generalised linear regression and classification for a synthetically generated dataset enco...
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networ...
We study the effect of regularization in an on-line gradient-descent learning scenario for a general...
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously ap...
Empirical observation of high dimensional phenomena, such as the double descent behavior, has attrac...
International audienceThis article proposes an original approach to the performance understanding of...
Understanding implicit bias of gradient descent for generalization capability of ReLU networks has b...
Recently people showed that wide neural networks can be approximated by linear models under gradient...
In this chapter, we describe the basic concepts behind the functioning of recurrent neural networks ...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Empirical studies show that gradient-based methods can learn deep neural networks (DNNs) with very g...
International audienceUnderstanding the learning dynamics of neural networks is one of the key issue...
Despite the widespread practical success of deep learning methods, our theoretical understanding of ...
Neural networks (NNs) have seen a surge in popularity due to their unprecedented practical success i...
Neural networks trained via gradient descent with random initialization and without any regularizati...
We study generalised linear regression and classification for a synthetically generated dataset enco...
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networ...
We study the effect of regularization in an on-line gradient-descent learning scenario for a general...
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously ap...
Empirical observation of high dimensional phenomena, such as the double descent behavior, has attrac...
International audienceThis article proposes an original approach to the performance understanding of...
Understanding implicit bias of gradient descent for generalization capability of ReLU networks has b...