We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation err...
Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkab...
We show the sup-norm convergence of deep neural network estimators with a novel adversarial training...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
Models built with deep neural network (DNN) can handle complicated real-world data extremely well, s...
Consider the multivariate nonparametric regression model. It is shown that estimators based on spars...
It is a central problem in both statistics and computer science to understand the theoretical founda...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously ap...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) i...
In this note, we study how neural networks with a single hidden layer and ReLU activation interpolat...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
In theory, recent results in nonparametric regression show that neural network estimates are able to...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Recently, many studies have shed light on the high adaptivity of deep neural network methods in nonp...
Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkab...
We show the sup-norm convergence of deep neural network estimators with a novel adversarial training...
This work finds the analytical expression of the global minima of a deep linear network with weight ...
Models built with deep neural network (DNN) can handle complicated real-world data extremely well, s...
Consider the multivariate nonparametric regression model. It is shown that estimators based on spars...
It is a central problem in both statistics and computer science to understand the theoretical founda...
© 2020 National Academy of Sciences. All rights reserved. While deep learning is successful in a num...
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that previously ap...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
The landscape of the empirical risk of overparametrized deep convolutional neural networks (DCNNs) i...
In this note, we study how neural networks with a single hidden layer and ReLU activation interpolat...
A main puzzle of deep networks revolves around the absence of overfitting despite overparametrizatio...
In theory, recent results in nonparametric regression show that neural network estimates are able to...
Recent work by Jacot et al. (2018) has shown that training a neural network using gradient descent i...
Recently, many studies have shed light on the high adaptivity of deep neural network methods in nonp...
Convolutional residual neural networks (ConvResNets), though overparameterized, can achieve remarkab...
We show the sup-norm convergence of deep neural network estimators with a novel adversarial training...
This work finds the analytical expression of the global minima of a deep linear network with weight ...