Neural networks have achieved remarkable empirical performance, while the current theoretical analysis is not adequate for understanding their success, e.g., the Neural Tangent Kernel approach fails to capture their key feature learning ability, while recent analyses on feature learning are typically problem-specific. This work proposes a unified analysis framework for two-layer networks trained by gradient descent. The framework is centered around the principle of feature learning from gradients, and its effectiveness is demonstrated by applications in several prototypical problems, such as mixtures of Gaussians and parity functions. The framework also sheds light on interesting network learning phenomena such as feature learning beyond ke...
We provide novel guaranteed approaches for training feedforward neural networks with sparse connecti...
Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have rece...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...
Recently, several studies have proven the global convergence and generalization abilities of the gra...
We analyze feature learning in infinite-width neural networks trained with gradient flow through a s...
This work establishes low test error of gradient flow (GF) and stochastic gradient descent (SGD) on ...
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networ...
Understanding the impact of data structure on the computational tractability of learning is a key ch...
Understanding the impact of data structure on the computational tractability of learning is a key ch...
Neural networks have achieved impressive results on many technological and scientific tasks. Yet, th...
Heuristics centered around gradient descent and function approximation by neural networks have prove...
Understanding how feature learning affects generalization is among the foremost goals of modern deep...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Recent results on optimization and generalization properties of neural networks showed that in a sim...
We study the diversity of the features learned by a two-layer neural network trained with the least ...
We provide novel guaranteed approaches for training feedforward neural networks with sparse connecti...
Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have rece...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...
Recently, several studies have proven the global convergence and generalization abilities of the gra...
We analyze feature learning in infinite-width neural networks trained with gradient flow through a s...
This work establishes low test error of gradient flow (GF) and stochastic gradient descent (SGD) on ...
In this work, we provide a characterization of the feature-learning process in two-layer ReLU networ...
Understanding the impact of data structure on the computational tractability of learning is a key ch...
Understanding the impact of data structure on the computational tractability of learning is a key ch...
Neural networks have achieved impressive results on many technological and scientific tasks. Yet, th...
Heuristics centered around gradient descent and function approximation by neural networks have prove...
Understanding how feature learning affects generalization is among the foremost goals of modern deep...
In this thesis, we theoretically analyze the ability of neural networks trained by gradient descent ...
Recent results on optimization and generalization properties of neural networks showed that in a sim...
We study the diversity of the features learned by a two-layer neural network trained with the least ...
We provide novel guaranteed approaches for training feedforward neural networks with sparse connecti...
Inspired by the theory of wide neural networks (NNs), kernel learning and feature learning have rece...
The generalization mystery in deep learning is the following: Why do over-parameterized neural netwo...