Fine-tuning large language models for different tasks can be costly and inefficient, and even methods that reduce the number of tuned parameters still require full gradient-based optimization. We propose HyperTuning, a novel approach to model adaptation that uses a hypermodel to generate task-specific parameters for a fixed downstream model. We demonstrate a simple setup for hypertuning with HyperT5, a T5-based hypermodel that produces soft prefixes or LoRA parameters for a frozen T5 model from few-shot examples. We train HyperT5 in two stages: first, hyperpretraining with a modified conditional language modeling objective that trains a hypermodel to generate parameters; second, multi-task fine-tuning (MTF) on a large number of diverse lang...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient ...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Language model fine-tuning is essential for modern natural language processing, but is computational...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of buildin...
Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained langu...
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new langu...
A recent family of techniques, dubbed as lightweight fine-tuning methods, facilitates parameter-effi...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient ...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Language model fine-tuning is essential for modern natural language processing, but is computational...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
The advent of hyper-scale and general-purpose pre-trained models is shifting the paradigm of buildin...
Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained langu...
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new langu...
A recent family of techniques, dubbed as lightweight fine-tuning methods, facilitates parameter-effi...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient ...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...