Language model fine-tuning is essential for modern natural language processing, but is computationally expensive and time-consuming. Further, the effectiveness of fine-tuning is limited by the inclusion of training examples that negatively affect performance. Here we present a general fine-tuning method that we call information gain filtration for improving the overall training efficiency and final performance of language model fine-tuning. We define the information gain of an example as the improvement on a test metric after training on that example. A secondary learner is then trained to approximate this quantity. During fine-tuning, this learner selects informative examples and skips uninformative ones. We show that our method has consis...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
In recent years, language models (LMs) have made remarkable progress in advancing the field of natu...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Finetuning can be used to tackle domain specific tasks by transferring knowledge learned from pre-tr...
The dominant approaches for controlling language models achieve prominence in controlling high-level...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Neural language models often fail to generate diverse and informative texts, limiting their applicab...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PL...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
In recent years, language models (LMs) have made remarkable progress in advancing the field of natu...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Finetuning can be used to tackle domain specific tasks by transferring knowledge learned from pre-tr...
The dominant approaches for controlling language models achieve prominence in controlling high-level...
Sparse Mixture-of-Experts (MoE) is a neural architecture design that can be utilized to add learnabl...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Widely used language models (LMs) are typically built by scaling up a two-stage training pipeline: a...
Neural language models often fail to generate diverse and informative texts, limiting their applicab...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...