In this paper, we move towards combining large parametric models with non-parametric prototypical networks. We propose prototypical fine-tuning, a novel prototypical framework for fine-tuning pretrained language models (LM), which automatically learns a bias to improve predictive performance for varying data sizes, especially low-resource settings. Our prototypical fine-tuning approach can automatically adjust the model capacity according to the number of data points and the model's inherent attributes. Moreover, we propose four principles for effective prototype fine-tuning towards the optimal solution. Experimental results across various datasets show that our work achieves significant performance improvements under various low-resource s...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained langu...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Large-scale pre-trained language models have achieved impressive results on a wide range of downstre...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Parameter-efficient fine-tuning (PEFT) has shown its effectiveness in adapting the pre-trained langu...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...