Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a pre-training stage that uses a very large, diverse dataset of text and a fine-tuning (sometimes, 'alignment') stage that uses targeted examples or other specifications of desired behaviors. While it has been hypothesized that knowledge and skills come from pre-training, and fine-tuning mostly filters this knowledge and skillset, this intuition has not been extensively tested. To aid in doing so, we introduce a novel technique for decoupling the knowledge and skills gained in these two stages, enabling a direct answer to the question, "What would happen if we combined the knowledge learned by a large model during pre-training with the knowled...
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks de...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
In recent years, large language models such as BERT and GPT-2 have revolutionized the field of Natur...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Language model fine-tuning is essential for modern natural language processing, but is computational...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks de...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
In recent years, large language models such as BERT and GPT-2 have revolutionized the field of Natur...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Language model fine-tuning is essential for modern natural language processing, but is computational...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Recent work has shown that fine-tuning large pre-trained language models on a collection of tasks de...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
In recent years, large language models such as BERT and GPT-2 have revolutionized the field of Natur...