Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-shot learning without changing model parameters. However, as we show, fine-tuning an LLM on any specific task generally destroys its in-context ability. We discover an important cause of this loss, format specialization, where the model overfits to the format of the fine-tuned task and is unable to output anything beyond this format. We further show that format specialization happens at the beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that preserves in-context abilities of the pretrained model. ProMoT first trains a soft prompt for ...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
In-context learning (ICL) has become the default method for using large language models (LLMs), maki...
With a handful of demonstration examples, large-scale language models show strong capability to perf...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Consistency is a key requirement of highquality translation. It is especially important to adhere t...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Open-sourced large language models (LLMs) have demonstrated remarkable efficacy in various tasks wit...
Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, whe...
Through in-context learning (ICL), large-scale language models are effective few-shot learners witho...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unsee...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
In-context learning (ICL) has become the default method for using large language models (LLMs), maki...
With a handful of demonstration examples, large-scale language models show strong capability to perf...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Consistency is a key requirement of highquality translation. It is especially important to adhere t...
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown e...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Open-sourced large language models (LLMs) have demonstrated remarkable efficacy in various tasks wit...
Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, whe...
Through in-context learning (ICL), large-scale language models are effective few-shot learners witho...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Few-shot in-context learning (ICL) enables pre-trained language models to perform a previously-unsee...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
In-context learning (ICL) has become the default method for using large language models (LLMs), maki...
With a handful of demonstration examples, large-scale language models show strong capability to perf...