Sparsity is a highly desired feature in deep neural networks (DNNs) since it ensures numerical efficiency, improves the interpretability of models (due to the smaller number of relevant features), and robustness. In machine learning approaches based on linear models, it is well known that there exists a connecting path between the sparsest solution in terms of the $\ell^1$ norm (i.e., zero weights) and the non-regularized solution, which is called the regularization path. Very recently, there was a first attempt to extend the concept of regularization paths to DNNs by means of treating the empirical loss and sparsity ($\ell^1$ norm) as two conflicting criteria and solving the resulting multiobjective optimization problem. However, due to th...
Regularizing the gradient norm of the output of a neural network is a powerful technique, rediscover...
Convex $\ell_1$ regularization using an infinite dictionary of neurons has been suggested for constr...
We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm ...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
Understanding the fundamental principles behind the success of deep neural networks is one of the mo...
Deep neural networks (DNN) are the state-of-the-art machine learning models outperforming traditiona...
Deep neural networks have relieved a great deal of burden on human experts in relation to feature en...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
We identify and prove a general principle: $L_1$ sparsity can be achieved using a redundant parametr...
Neural networks are more expressive when they have multiple layers. In turn, conventional training m...
The leaky ReLU network with a group sparse regularization term has been widely used in the recent ye...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
We study the theory of neural network (NN) from the lens of classical nonparametric regression probl...
In this paper we propose a general framework to characterize and solve the optimization problems und...
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, e...
Regularizing the gradient norm of the output of a neural network is a powerful technique, rediscover...
Convex $\ell_1$ regularization using an infinite dictionary of neurons has been suggested for constr...
We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm ...
The general features of the optimization problem for the case of overparametrized nonlinear networks...
Understanding the fundamental principles behind the success of deep neural networks is one of the mo...
Deep neural networks (DNN) are the state-of-the-art machine learning models outperforming traditiona...
Deep neural networks have relieved a great deal of burden on human experts in relation to feature en...
The classical statistical learning theory implies that fitting too many parameters leads to overfitt...
We identify and prove a general principle: $L_1$ sparsity can be achieved using a redundant parametr...
Neural networks are more expressive when they have multiple layers. In turn, conventional training m...
The leaky ReLU network with a group sparse regularization term has been widely used in the recent ye...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
We study the theory of neural network (NN) from the lens of classical nonparametric regression probl...
In this paper we propose a general framework to characterize and solve the optimization problems und...
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, e...
Regularizing the gradient norm of the output of a neural network is a powerful technique, rediscover...
Convex $\ell_1$ regularization using an infinite dictionary of neurons has been suggested for constr...
We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm ...