We solve an open question from Lu et al. (2017), by showing that any target network with inputs in $\mathbb{R}^d$ can be approximated by a width $O(d)$ network (independent of the target network's architecture), whose number of parameters is essentially larger only by a linear factor. In light of previous depth separation theorems, which imply that a similar result cannot hold when the roles of width and depth are interchanged, it follows that depth plays a more significant role than width in the expressive power of neural networks. We extend our results to constructing networks with bounded weights, and to constructing networks with width at most $d+2$, which is close to the minimal possible width due to previous lower bounds. Both of th...
One critical aspect neural network designers face today is choosing an appropriate network size for ...
The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks...
This paper proposes a new neural network architecture by introducing an additional dimension called ...
While classic studies proved that wide networks allow universal approximation, recent research and s...
We contribute to a better understanding of the class of functions that can be represented by a neura...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
We contribute to a better understanding of the class of functions that is represented by a neural ne...
Recently there has been much interest in understanding why deep neural networks are preferred to sha...
People believe that depth plays an important role in success of deep neural networks (DNN). However,...
© 2016 World Scientific Publishing Company. The paper briefly reviews several recent results on hier...
A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy ...
We prove in this paper that optimizing wide ReLU neural networks (NNs) with at least one hidden laye...
This paper studies the expressive power of graph neural networks falling within the message-passing ...
In this note, we study how neural networks with a single hidden layer and ReLU activation interpolat...
This paper focuses on establishing $L^2$ approximation properties for deep ReLU convolutional neural...
One critical aspect neural network designers face today is choosing an appropriate network size for ...
The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks...
This paper proposes a new neural network architecture by introducing an additional dimension called ...
While classic studies proved that wide networks allow universal approximation, recent research and s...
We contribute to a better understanding of the class of functions that can be represented by a neura...
In practice, multi-task learning (through learning features shared among tasks) is an essential prop...
We contribute to a better understanding of the class of functions that is represented by a neural ne...
Recently there has been much interest in understanding why deep neural networks are preferred to sha...
People believe that depth plays an important role in success of deep neural networks (DNN). However,...
© 2016 World Scientific Publishing Company. The paper briefly reviews several recent results on hier...
A remarkable characteristic of overparameterized deep neural networks (DNNs) is that their accuracy ...
We prove in this paper that optimizing wide ReLU neural networks (NNs) with at least one hidden laye...
This paper studies the expressive power of graph neural networks falling within the message-passing ...
In this note, we study how neural networks with a single hidden layer and ReLU activation interpolat...
This paper focuses on establishing $L^2$ approximation properties for deep ReLU convolutional neural...
One critical aspect neural network designers face today is choosing an appropriate network size for ...
The strong lottery ticket hypothesis has highlighted the potential for training deep neural networks...
This paper proposes a new neural network architecture by introducing an additional dimension called ...