In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model's sparsity increases, the performance first worsens, then improves, and finally deteriorates. Such a non-monotonic behavior raises serious questions about the optimal model's size to maintain high performance: the model needs to be sufficiently over-parametrized, but having too many parameters wastes training resources. In this paper, we aim to find the best trade-off efficiently. More precisely, we tackle the occurrence of the sparse double descent and present some solutions to avoid it. Firstly, we show that a simp...
Sparsity is commonly produced from model compression (i.e., pruning), which eliminates unnecessary p...
Recently, sparse training methods have started to be established as a de facto approach for training...
Nowadays, the explosive data scale increase provides an unprecedented opportunity to apply machine l...
Finding the optimal size of deep learning models is very actual and of broad impact, especially in e...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
The training of sparse neural networks is becoming an increasingly important tool for reducing the ...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Deep networks are typically trained with many more parameters than the size of the training dataset....
It is widely believed that the success of deep networks lies in their ability to learn a meaningful ...
Deep neural networks (DNN) are the state-of-the-art machine learning models outperforming traditiona...
Sparse coding is a crucial subroutine in algorithms for various signal processing, deep learning, an...
Progress in Machine Learning is being driven by continued growth in model size, training data and al...
Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional d...
In deep learning it is common to overparameterize neural networks, that is, to use more parameters t...
Overparameterized neural networks generalize well but are expensive to train. Ideally, one would lik...
Sparsity is commonly produced from model compression (i.e., pruning), which eliminates unnecessary p...
Recently, sparse training methods have started to be established as a de facto approach for training...
Nowadays, the explosive data scale increase provides an unprecedented opportunity to apply machine l...
Finding the optimal size of deep learning models is very actual and of broad impact, especially in e...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
The training of sparse neural networks is becoming an increasingly important tool for reducing the ...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Deep networks are typically trained with many more parameters than the size of the training dataset....
It is widely believed that the success of deep networks lies in their ability to learn a meaningful ...
Deep neural networks (DNN) are the state-of-the-art machine learning models outperforming traditiona...
Sparse coding is a crucial subroutine in algorithms for various signal processing, deep learning, an...
Progress in Machine Learning is being driven by continued growth in model size, training data and al...
Sparse machine learning has recently emerged as powerful tool to obtain models of high-dimensional d...
In deep learning it is common to overparameterize neural networks, that is, to use more parameters t...
Overparameterized neural networks generalize well but are expensive to train. Ideally, one would lik...
Sparsity is commonly produced from model compression (i.e., pruning), which eliminates unnecessary p...
Recently, sparse training methods have started to be established as a de facto approach for training...
Nowadays, the explosive data scale increase provides an unprecedented opportunity to apply machine l...