The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the demand for model compression. Despite various methods to compress BERT or its variants, there are few attempts to compress generative PLMs, and the underlying difficulty remains unclear. In this paper, we compress generative PLMs by quantization. We find that previous quantization methods fail on generative tasks due to the \textit{homogeneous word embeddings} caused by reduced capacity, and \textit{varied distribution of weights}. Correspondingly, we propose a token-level contrastive distillation to learn distinguishable word embeddings, and a module-wise dynamic scaling to make quantizers adaptive to different modules. Empirical results on vario...
Large language models(LLMs) exhibit excellent performance across a variety of tasks, but they come w...
Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quant...
Scaling language models with more data, compute and parameters has driven significant progress in na...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
The recent advance of self-supervised learning associated with the Transformer architecture enables ...
With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
Large Language Models (LLMs) from the GPT family have become extremely popular, leading to a race to...
© 2022 Piao et al. This is an open access article distributed under the terms of the Creative Common...
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classifi...
Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processin...
DNN-based speaker verification (SV) models demonstrate significant performance at relatively high co...
To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either ...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translati...
Large language models(LLMs) exhibit excellent performance across a variety of tasks, but they come w...
Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quant...
Scaling language models with more data, compute and parameters has driven significant progress in na...
There are growing interests in adapting large-scale language models using parameter-efficient fine-t...
The recent advance of self-supervised learning associated with the Transformer architecture enables ...
With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
Large Language Models (LLMs) from the GPT family have become extremely popular, leading to a race to...
© 2022 Piao et al. This is an open access article distributed under the terms of the Creative Common...
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classifi...
Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processin...
DNN-based speaker verification (SV) models demonstrate significant performance at relatively high co...
To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either ...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translati...
Large language models(LLMs) exhibit excellent performance across a variety of tasks, but they come w...
Large language models (LLMs) show excellent performance but are compute- and memory-intensive. Quant...
Scaling language models with more data, compute and parameters has driven significant progress in na...