Gigantic pre-trained models have become central to natural language processing (NLP), serving as the starting point for fine-tuning towards a range of downstream tasks. However, two pain points persist for this paradigm: (a) as the pre-trained models grow bigger (e.g., 175B parameters for GPT-3), even the fine-tuning process can be time-consuming and computationally expensive; (b) the fine-tuned model has the same size as its starting point by default, which is neither sensible due to its more specialized functionality, nor practical since many fine-tuned models will be deployed in resource-constrained environments. To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the spars...
Model sparsification in deep learning promotes simpler, more interpretable models with fewer paramet...
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP t...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fin...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Large Language Models have become the core architecture upon which most modern natural language proc...
Model sparsification in deep learning promotes simpler, more interpretable models with fewer paramet...
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP t...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fin...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Deep learning has been empirically successful in recent years thanks to the extremely over-parameter...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Large Language Models have become the core architecture upon which most modern natural language proc...
Model sparsification in deep learning promotes simpler, more interpretable models with fewer paramet...
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP t...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...