A recent family of techniques, dubbed as lightweight fine-tuning methods, facilitates parameter-efficient transfer learning by updating only a small set of additional parameters while keeping the parameters of the pretrained language model frozen. While proven to be an effective method, there are no existing studies on if and how such knowledge of the downstream fine-tuning approach should affect the pretraining stage. In this work, we show that taking the ultimate choice of fine-tuning method into consideration boosts the performance of parameter-efficient fine-tuning. By relying on optimization-based meta-learning using MAML with certain modifications for our distinct purpose, we prime the pretrained model specifically for parameter-effic...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Many classification algorithms, such as Neural Networks and Support Vector Machines, have a range of...
The exponential growth of volume, variety and velocity of the data is raising the need for investiga...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
Model-agnostic meta-learning (MAML) is a meta-learning technique to train a model on a multitude of ...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP t...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Finetuning can be used to tackle domain specific tasks by transferring knowledge learned from pre-tr...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Many classification algorithms, such as Neural Networks and Support Vector Machines, have a range of...
The exponential growth of volume, variety and velocity of the data is raising the need for investiga...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
Model-agnostic meta-learning (MAML) is a meta-learning technique to train a model on a multitude of ...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP t...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Finetuning can be used to tackle domain specific tasks by transferring knowledge learned from pre-tr...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Many classification algorithms, such as Neural Networks and Support Vector Machines, have a range of...
The exponential growth of volume, variety and velocity of the data is raising the need for investiga...