The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity. In this work, we introduce Powerpropagation, a new weight-parameterisation for neural networks that leads to inherently sparse models. Exploiting the behaviour of gradient descent, our method gives rise to weight updates exhibiting a “rich get richer” dynamic, leaving low-magnitude parameters largely unaffected by learning. Models trained...
Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of...
The ever-increasing number of parameters in deep neural networks poses challenges for memory-limited...
We investigate filter level sparsity that emerges in convolutional neural networks (CNNs) which empl...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Deep neural networks exploiting millions of parameters are nowadays the norm in deep learning applic...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Although increasing model size can enhance the adversarial robustness of deep neural networks, in re...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
Recently, sparse training methods have started to be established as a de facto approach for training...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
In energy-efficient schemes, finding the optimal size of deep learning models is very important and ...
Deep networks are typically trained with many more parameters than the size of the training dataset....
Overparameterized neural networks generalize well but are expensive to train. Ideally, one would lik...
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end ...
Progress in Machine Learning is being driven by continued growth in model size, training data and al...
Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of...
The ever-increasing number of parameters in deep neural networks poses challenges for memory-limited...
We investigate filter level sparsity that emerges in convolutional neural networks (CNNs) which empl...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Deep neural networks exploiting millions of parameters are nowadays the norm in deep learning applic...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Although increasing model size can enhance the adversarial robustness of deep neural networks, in re...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
Recently, sparse training methods have started to be established as a de facto approach for training...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
In energy-efficient schemes, finding the optimal size of deep learning models is very important and ...
Deep networks are typically trained with many more parameters than the size of the training dataset....
Overparameterized neural networks generalize well but are expensive to train. Ideally, one would lik...
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end ...
Progress in Machine Learning is being driven by continued growth in model size, training data and al...
Works on lottery ticket hypothesis (LTH) and single-shot network pruning (SNIP) have raised a lot of...
The ever-increasing number of parameters in deep neural networks poses challenges for memory-limited...
We investigate filter level sparsity that emerges in convolutional neural networks (CNNs) which empl...