As language models have grown in parameters and layers, it has become much harder to train and infer with them on single GPUs. This is severely restricting the availability of large language models such as GPT-3, BERT-Large, and many others. A common technique to solve this problem is pruning the network architecture by removing transformer heads, fully-connected weights, and other modules. The main challenge is to discern the important parameters from the less important ones. Our goal is to find strong metrics for identifying such parameters. We thus propose two strategies: Cam-Cut based on the GradCAM interpretations, and Smooth-Cut based on the SmoothGrad, for calculating the importance scores. Through this work, we show that our scoring...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Transformers allow attention between all pairs of tokens, but there is reason to believe that most o...
As language models increase in size by the day, methods for efficient inference are critical to leve...
Transformer-based language models have become a key building block for natural language processing. ...
We revisit the performance of the classic gradual magnitude pruning (GMP) baseline for large languag...
Current pre-trained language models rely on large datasets for achieving state-of-the-art performanc...
Model compression by way of parameter pruning, quantization, or distillation has recently gained pop...
Large pre-trained language models have recently gained significant traction due to their improved pe...
Neural networks are powerful solutions to help with decision making and solve complex problems in r...
The growing size of neural language models has led to increased attention in model compression. The ...
The success of convolutional neural networks (CNNs) in various applications is accompanied by a sign...
Large and performant neural networks are often overparameterized and can be drastically reduced in s...
Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processin...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Transformers allow attention between all pairs of tokens, but there is reason to believe that most o...
As language models increase in size by the day, methods for efficient inference are critical to leve...
Transformer-based language models have become a key building block for natural language processing. ...
We revisit the performance of the classic gradual magnitude pruning (GMP) baseline for large languag...
Current pre-trained language models rely on large datasets for achieving state-of-the-art performanc...
Model compression by way of parameter pruning, quantization, or distillation has recently gained pop...
Large pre-trained language models have recently gained significant traction due to their improved pe...
Neural networks are powerful solutions to help with decision making and solve complex problems in r...
The growing size of neural language models has led to increased attention in model compression. The ...
The success of convolutional neural networks (CNNs) in various applications is accompanied by a sign...
Large and performant neural networks are often overparameterized and can be drastically reduced in s...
Pre-trained Language Models (PLMs) have achieved great success in various Natural Language Processin...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many pri...
With the dramatically increased number of parameters in language models, sparsity methods have recei...
Transformers allow attention between all pairs of tokens, but there is reason to believe that most o...
As language models increase in size by the day, methods for efficient inference are critical to leve...