We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity. Spartan is based on a combination of two techniques: (1) soft top-k masking of low-magnitude parameters via a regularized optimal transportation problem and (2) dual averaging-based parameter updates with hard sparsification in the forward pass. This scheme realizes an exploration-exploitation tradeoff: early in training, the learner is able to explore various sparsity patterns, and as the soft top-k approximation is gradually sharpened over the course of training, the balance shifts towards parameter optimization with respect to a fixed sparsity mask. Spartan is sufficiently flexible to accommodate a variety of sparsity allocation...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Deep learning is finding its way into the embedded world with applications such as autonomous drivin...
The articles in this special section focus on learning adaptive models. Over the past few years, spa...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Deep Neural Networks (DNNs) have emerged as an important class of machine learning algorithms, provi...
The training of sparse neural networks is becoming an increasingly important tool for reducing the ...
Overparameterized neural networks generalize well but are expensive to train. Ideally, one would lik...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Brain-inspired event-driven processors execute deep neural networks (DNNs) in a sparsity-aware manne...
International audienceSparsifying deep neural networks is of paramount interest in many areas, espec...
Turning the weights to zero when training a neural network helps in reducing the computational compl...
Model sparsification in deep learning promotes simpler, more interpretable models with fewer paramet...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
In deep learning, fine-grained N:M sparsity reduces the data footprint and bandwidth of a General Ma...
Sparsity plays a key role in machine learning for several reasons including interpretability. Interp...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Deep learning is finding its way into the embedded world with applications such as autonomous drivin...
The articles in this special section focus on learning adaptive models. Over the past few years, spa...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Deep Neural Networks (DNNs) have emerged as an important class of machine learning algorithms, provi...
The training of sparse neural networks is becoming an increasingly important tool for reducing the ...
Overparameterized neural networks generalize well but are expensive to train. Ideally, one would lik...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Brain-inspired event-driven processors execute deep neural networks (DNNs) in a sparsity-aware manne...
International audienceSparsifying deep neural networks is of paramount interest in many areas, espec...
Turning the weights to zero when training a neural network helps in reducing the computational compl...
Model sparsification in deep learning promotes simpler, more interpretable models with fewer paramet...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
In deep learning, fine-grained N:M sparsity reduces the data footprint and bandwidth of a General Ma...
Sparsity plays a key role in machine learning for several reasons including interpretability. Interp...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Deep learning is finding its way into the embedded world with applications such as autonomous drivin...
The articles in this special section focus on learning adaptive models. Over the past few years, spa...