Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks. However, very few of these studies have analyzed the impact of compression on the generalizability and robustness of compressed models for out-of-distribution (OOD) data. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the shortcut samples ...
Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Fine-tuning pre-trained models have achieved impressive performance on standard natural language pro...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processin...
Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translati...
Large language models (LLMs), while transformative for NLP, come with significant computational dema...
As language models increase in size by the day, methods for efficient inference are critical to leve...
Model compression by way of parameter pruning, quantization, or distillation has recently gained pop...
Transformer-based language models have become a key building block for natural language processing. ...
When considering a model architecture, there are several ways to reduce its memory footprint. Histor...
The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the deman...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
The growing size of neural language models has led to increased attention in model compression. The ...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Fine-tuning pre-trained models have achieved impressive performance on standard natural language pro...
Multilingual models are often particularly dependent on scaling to generalize to a growing number of...
Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processin...
Leveraging shared learning through Massively Multilingual Models, state-of-the-art machine translati...
Large language models (LLMs), while transformative for NLP, come with significant computational dema...
As language models increase in size by the day, methods for efficient inference are critical to leve...
Model compression by way of parameter pruning, quantization, or distillation has recently gained pop...
Transformer-based language models have become a key building block for natural language processing. ...
When considering a model architecture, there are several ways to reduce its memory footprint. Histor...
The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the deman...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
The growing size of neural language models has led to increased attention in model compression. The ...
The reusability of state-of-the-art Pre-trained Language Models (PLMs) is often limited by their gen...
Factorizing a large matrix into small matrices is a popular strategy for model compression. Singular...
Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long infe...
Fine-tuning pre-trained models have achieved impressive performance on standard natural language pro...