Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch. We show that both models display sample-wise double-descent behavior in the presence of label noise. Random feature models can also display model-wise double-descent if there are narrow bottleneck layers, while deep networks do no...
Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not we...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and pred...
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networ...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...
This paper provides theoretical insights into why and how deep learning can generalize well, despite...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
Modern deep neural networks are highly over-parameterized compared to the data on which they are tra...
Kernel methods and neural networks are two important schemes in the supervised learning field. The t...
Deep learning has transformed computer vision, natural language processing, and speech recognition. ...
We investigate deep Bayesian neural networks with Gaussian priors on the weights and ReLU-like nonli...
Over-parameterized deep neural networks (DNNs) with sufficient capacity to memorize random noise can...
It is widely believed that the success of deep networks lies in their ability to learn a meaningful ...
Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressiv...
Deep learning algorithms (in particular Convolutional Neural Networks, or CNNs) have shown their sup...
Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not we...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and pred...
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networ...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...
This paper provides theoretical insights into why and how deep learning can generalize well, despite...
This is the final version. Available from ICLR via the link in this recordDeep neural networks (DNNs...
Modern deep neural networks are highly over-parameterized compared to the data on which they are tra...
Kernel methods and neural networks are two important schemes in the supervised learning field. The t...
Deep learning has transformed computer vision, natural language processing, and speech recognition. ...
We investigate deep Bayesian neural networks with Gaussian priors on the weights and ReLU-like nonli...
Over-parameterized deep neural networks (DNNs) with sufficient capacity to memorize random noise can...
It is widely believed that the success of deep networks lies in their ability to learn a meaningful ...
Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressiv...
Deep learning algorithms (in particular Convolutional Neural Networks, or CNNs) have shown their sup...
Gradient-based deep-learning algorithms exhibit remarkable performance in practice, but it is not we...
Two distinct limits for deep learning have been derived as the network width h -> infinity, dependin...
Deep learning is a form of machine learning for nonlinear high dimensional pattern matching and pred...