Large pre-trained language models have recently gained significant traction due to their improved performance on various down-stream tasks like text classification and question answering, requiring only few epochs of fine-tuning. However, their large model sizes often prohibit their applications on resource-constrained edge devices. Existing solutions of yielding parameter-efficient BERT models largely rely on compute-exhaustive training and fine-tuning. Moreover, they often rely on additional compute heavy models to mitigate the performance gap. In this paper, we present Sensi-BERT, a sensitivity driven efficient fine-tuning of BERT models that can take an off-the-shelf pre-trained BERT model and yield highly parameter-efficient models for...
Limited computational budgets often prevent transformers from being used in production and from havi...
Fine-tuning pre-trained models have achieved impressive performance on standard natural language pro...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
© 2022 Piao et al. This is an open access article distributed under the terms of the Creative Common...
As language models have grown in parameters and layers, it has become much harder to train and infer...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Transformer-based language models have become a key building block for natural language processing. ...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Transformer based architectures have become de-facto models used for a range of Natural Language Pro...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Recently, the development of pre-trained language models has brought natural language processing (NL...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Limited computational budgets often prevent transformers from being used in production and from havi...
Fine-tuning pre-trained models have achieved impressive performance on standard natural language pro...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
© 2022 Piao et al. This is an open access article distributed under the terms of the Creative Common...
As language models have grown in parameters and layers, it has become much harder to train and infer...
Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-tuning),...
Transformer-based language models have become a key building block for natural language processing. ...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Transformer based architectures have become de-facto models used for a range of Natural Language Pro...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Recently, the development of pre-trained language models has brought natural language processing (NL...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Pre-training a language model and then fine-tuning it for downstream tasks has demonstrated state-of...
Limited computational budgets often prevent transformers from being used in production and from havi...
Fine-tuning pre-trained models have achieved impressive performance on standard natural language pro...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...