Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained fro...
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new langu...
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP t...
State-of-the-art neural (re)rankers are notoriously data hungry which - given the lack of large-scal...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fin...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
With the increasing prevalence of Large Language Models, traditional full fine-tuning approaches fac...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Language model fine-tuning is essential for modern natural language processing, but is computational...
A recent family of techniques, dubbed as lightweight fine-tuning methods, facilitates parameter-effi...
Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks b...
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new langu...
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP t...
State-of-the-art neural (re)rankers are notoriously data hungry which - given the lack of large-scal...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
We consider the problem of accurate sparse fine-tuning of large language models (LLMs), that is, fin...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
With the increasing prevalence of Large Language Models, traditional full fine-tuning approaches fac...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
Language model fine-tuning is essential for modern natural language processing, but is computational...
A recent family of techniques, dubbed as lightweight fine-tuning methods, facilitates parameter-effi...
Parameter-efficient fine-tuning (PEFT) methods can adapt large language models to downstream tasks b...
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new langu...
Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP t...
State-of-the-art neural (re)rankers are notoriously data hungry which - given the lack of large-scal...