Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processing (NLP) tasks under the pre-training and fine-tuning paradigm. With large quantities of parameters, PLMs are computation-intensive and resource-hungry. Hence, model pruning has been introduced to compress large-scale PLMs. However, most prior approaches only consider task-specific knowledge towards downstream tasks, but ignore the essential task-agnostic knowledge during pruning, which may cause catastrophic forgetting problem and lead to poor generalization ability. To maintain both task-agnostic and task-specific knowledge in our pruned model, we propose ContrAstive Pruning (CAP) under the paradigm of pre-training and fine-tuning. I...
Deep networks are typically trained with many more parameters than the size of the training dataset....
Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, ...
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end ...
Large Language Models have become the core architecture upon which most modern natural language proc...
Model compression by way of parameter pruning, quantization, or distillation has recently gained pop...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Transformer-based language models have become a key building block for natural language processing. ...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
We study the impact of different pruning techniques on the representation learned by deep neural net...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
The growing size of neural language models has led to increased attention in model compression. The ...
Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide ran...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Deep networks are typically trained with many more parameters than the size of the training dataset....
Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, ...
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end ...
Large Language Models have become the core architecture upon which most modern natural language proc...
Model compression by way of parameter pruning, quantization, or distillation has recently gained pop...
The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural La...
Transformer-based language models have become a key building block for natural language processing. ...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
We study the impact of different pruning techniques on the representation learned by deep neural net...
Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major ...
The growing size of neural language models has led to increased attention in model compression. The ...
Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide ran...
The growing energy and performance costs of deep learning have driven the community to reduce the si...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the...
Deep networks are typically trained with many more parameters than the size of the training dataset....
Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, ...
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end ...