We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian population gradient flow dynamics, and provide a quantitat...
Natural gradient descent (NGD) is an on-line algorithm for redefining the steepest descent direction...
Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex f...
Neural networks have achieved remarkable empirical performance, while the current theoretical analys...
Single-index models are a class of functions given by an unknown univariate ``link'' function applie...
8 pages + appendix, 4 figuresInternational audienceWe analyze in a closed form the learning dynamics...
We analyze in a closed form the learning dynamics of the stochastic gradient descent (SGD) for a sin...
We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical ...
Data in many scientific and engineering applications are structured and contain multiple aspects. Th...
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks wit...
Understanding the impact of data structure on the computational tractability of learning is a key ch...
We present a probabilistic analysis of the long-time behaviour of the nonlocal, diffusive equations ...
International audienceMany tasks in machine learning and signal processing can be solved by minimizi...
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achiev...
Understanding the reasons for the success of deep neural networks trained using stochastic gradient-...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Natural gradient descent (NGD) is an on-line algorithm for redefining the steepest descent direction...
Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex f...
Neural networks have achieved remarkable empirical performance, while the current theoretical analys...
Single-index models are a class of functions given by an unknown univariate ``link'' function applie...
8 pages + appendix, 4 figuresInternational audienceWe analyze in a closed form the learning dynamics...
We analyze in a closed form the learning dynamics of the stochastic gradient descent (SGD) for a sin...
We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical ...
Data in many scientific and engineering applications are structured and contain multiple aspects. Th...
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks wit...
Understanding the impact of data structure on the computational tractability of learning is a key ch...
We present a probabilistic analysis of the long-time behaviour of the nonlocal, diffusive equations ...
International audienceMany tasks in machine learning and signal processing can be solved by minimizi...
Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achiev...
Understanding the reasons for the success of deep neural networks trained using stochastic gradient-...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Natural gradient descent (NGD) is an on-line algorithm for redefining the steepest descent direction...
Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex f...
Neural networks have achieved remarkable empirical performance, while the current theoretical analys...