Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained language models to downstream tasks while only updating a small number of parameters. Despite the success, most existing methods independently adapt to each task without considering knowledge transfer between tasks and are limited to low-data regimes. To overcome this issue, we propose Prototype-based HyperAdapter (PHA), a novel framework built on the adapter-tuning and hypernetwork. It introduces an instance-dense retriever and a prototypical hypernetwork to generate the conditional modules in a sample-efficient manner. This leads to comparable performance improvements against existing PEFT methods on multi-task learning and few-shot transfer lea...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Recent work has found that multi-task training with a large number of diverse tasks can uniformly im...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...
Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks b...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard met...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient ...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obta...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Hyperparameter optimization (HPO) is crucial for fine-tuning machine learning models but can be comp...
We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-effici...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Recent work has found that multi-task training with a large number of diverse tasks can uniformly im...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...
Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks b...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard met...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient ...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obta...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Hyperparameter optimization (HPO) is crucial for fine-tuning machine learning models but can be comp...
We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-effici...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream appro...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Recent work has found that multi-task training with a large number of diverse tasks can uniformly im...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...