There are growing interests in adapting large-scale language models using parameter-efficient fine-tuning methods. However, accelerating the model itself and achieving better inference efficiency through model compression has not been thoroughly explored yet. Model compression could provide the benefits of reducing memory footprints, enabling low-precision computations, and ultimately achieving cost-effective inference. To combine parameter-efficient adaptation and model compression, we propose AlphaTuning consisting of post-training quantization of the pre-trained language model and fine-tuning only some parts of quantized parameters for a target task. Specifically, AlphaTuning works by employing binary-coding quantization, which factorize...
Neural data compression has been shown to outperform classical methods in terms of rate-distortion (...
The recent advance of self-supervised learning associated with the Transformer architecture enables ...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the deman...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resourc...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new langu...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Large language models(LLMs) exhibit excellent performance across a variety of tasks, but they come w...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...
Neural data compression has been shown to outperform classical methods in terms of rate-distortion (...
The recent advance of self-supervised learning associated with the Transformer architecture enables ...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the deman...
We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset ...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
In this paper, we move towards combining large parametric models with non-parametric prototypical ne...
Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resourc...
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updat...
Pre-trained language models (PLMs) have demonstrated impressive performance across various downstrea...
Prior work shows that it is possible to expand pretrained Masked Language Models (MLMs) to new langu...
Large language models (LLMs) and vision language models (VLMs) demonstrate excellent performance on ...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Large language models(LLMs) exhibit excellent performance across a variety of tasks, but they come w...
Transformer-based pre-trained models with millions of parameters require large storage. Recent appro...
Neural data compression has been shown to outperform classical methods in terms of rate-distortion (...
The recent advance of self-supervised learning associated with the Transformer architecture enables ...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...