Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks by training a small amount of newly added parameters. In multi-task settings, PEFT adapters typically train on each task independently, inhibiting transfer across tasks, or on the concatenation of all tasks, which can lead to negative interference. To address this, Polytropon (Ponti et al.) jointly learns an inventory of PEFT adapters and a routing function to share variable-size sets of adapters across tasks. Subsequently, adapters can be re-combined and fine-tuned on novel tasks even with limited data. In this paper, we investigate to what extent the ability to control which adapters are active for each task leads to sample-efficient general...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained langu...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard met...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tun...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained langu...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard met...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Parameter-efficient fine-tuning methods (PEFTs) offer the promise of adapting large pre-trained mode...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
The current modus operandi in adapting pre-trained models involves updating all the backbone paramet...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
We present Generalized LoRA (GLoRA), an advanced approach for universal parameter-efficient fine-tun...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Language model fine-tuning is essential for modern natural language processing, but is computational...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...