The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d-1})$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. Although all layers are trained, the gradient flow convergence is based on a neural tangent kernel (NTK) argument for the non-convex second but last layer. Unlike standard NTK analysis, the continuous error norm implies an under-parametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to establ...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
We study the computational complexity of (deterministic or randomized) algorithms based on point sam...
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gra...
This work features an original result linking approximation and optimization theory for deep learnin...
Two aspects of neural networks that have been extensively studied in the recent literature are their...
We establish in this work approximation results of deep neural networks for smooth functions measure...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Recently, deep Convolutional Neural Networks (CNNs) have proven to be successful when employed in ar...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
The Lipschitz constant is an important quantity that arises in analysing the convergence of gradient...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
Deep learning has become an important toolkit for data science and artificial intelligence. In contr...
International audienceWe study the expressivity of deep neural networks. Measuring a network's compl...
Abstract—The problem of approximating functions by neural networks using incremental algorithms is s...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
We study the computational complexity of (deterministic or randomized) algorithms based on point sam...
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gra...
This work features an original result linking approximation and optimization theory for deep learnin...
Two aspects of neural networks that have been extensively studied in the recent literature are their...
We establish in this work approximation results of deep neural networks for smooth functions measure...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Recently, deep Convolutional Neural Networks (CNNs) have proven to be successful when employed in ar...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
The Lipschitz constant is an important quantity that arises in analysing the convergence of gradient...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
Deep learning has become an important toolkit for data science and artificial intelligence. In contr...
International audienceWe study the expressivity of deep neural networks. Measuring a network's compl...
Abstract—The problem of approximating functions by neural networks using incremental algorithms is s...
International audienceTraining over-parameterized neural networks involves the empirical minimizatio...
We study the computational complexity of (deterministic or randomized) algorithms based on point sam...
We prove that two-layer (Leaky)ReLU networks with one-dimensional input and output trained using gra...