Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to cho...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to ...
This paper aims to compare different regularization strategies to address a common phenomenon, sever...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
A recent family of techniques, dubbed as lightweight fine-tuning methods, facilitates parameter-effi...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
The advancement of neural network models has led to state-of-the-art performance in a wide range of ...
Finetuning can be used to tackle domain specific tasks by transferring knowledge learned from pre-tr...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
The tuning of learning algorithm parameters has become more and more important during the last years...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to ...
This paper aims to compare different regularization strategies to address a common phenomenon, sever...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
A recent family of techniques, dubbed as lightweight fine-tuning methods, facilitates parameter-effi...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
The advancement of neural network models has led to state-of-the-art performance in a wide range of ...
Finetuning can be used to tackle domain specific tasks by transferring knowledge learned from pre-tr...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
The tuning of learning algorithm parameters has become more and more important during the last years...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Why can pre-trained language models (PLMs) learn universal representations and effectively adapt to ...
This paper aims to compare different regularization strategies to address a common phenomenon, sever...