Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior work on pruning Transformers requires retraining the models. This can add high training cost and high complexity to model deployment, making it difficult to use in many practical situations. To address this, we propose a fast post-training pruning framework for Transformers that does not require any retraining. Given a resource constraint and a sample dataset, our framework automatically prunes the Transformer model using structured sparsity methods. To retain high accuracy without retraining, we introduce three novel techniques: (i) a lightweight mask search algorithm that finds which heads and filters to prune based on the Fisher informatio...
Introducing sparsity in a neural network has been an efficient way to reduce its complexity while ke...
We show for the first time that large-scale generative pretrained transformer (GPT) family models ca...
Pretrained transformer models have demonstrated remarkable performance across various natural langua...
This thesis addresses the crucial issue of deploying large Transformer models on resource-constraine...
The computation necessary for training Transformer-based language models has skyrocketed in recent y...
Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exp...
Neural machine translation (NMT) strongly outperforms previous statistical techniques. With the eme...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
The growing size of neural language models has led to increased attention in model compression. The ...
Sparsifying the Transformer has garnered considerable interest, as training the Transformer is very ...
There has been an explosion of interest in designing high-performance Transformers. While Transforme...
We revisit the design choices in Transformers, and propose methods to address their weaknesses in ha...
The great success of transformer-based models in natural language processing (NLP) has led to variou...
Introducing sparsity in a neural network has been an efficient way to reduce its complexity while ke...
We show for the first time that large-scale generative pretrained transformer (GPT) family models ca...
Pretrained transformer models have demonstrated remarkable performance across various natural langua...
This thesis addresses the crucial issue of deploying large Transformer models on resource-constraine...
The computation necessary for training Transformer-based language models has skyrocketed in recent y...
Deployment of Transformer models on edge devices is becoming increasingly challenging due to the exp...
Neural machine translation (NMT) strongly outperforms previous statistical techniques. With the eme...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
Sparsity has become one of the promising methods to compress and accelerate Deep Neural Networks (DN...
Vision transformers (ViTs) have recently obtained success in many applications, but their intensive ...
The growing size of neural language models has led to increased attention in model compression. The ...
Sparsifying the Transformer has garnered considerable interest, as training the Transformer is very ...
There has been an explosion of interest in designing high-performance Transformers. While Transforme...
We revisit the design choices in Transformers, and propose methods to address their weaknesses in ha...
The great success of transformer-based models in natural language processing (NLP) has led to variou...
Introducing sparsity in a neural network has been an efficient way to reduce its complexity while ke...
We show for the first time that large-scale generative pretrained transformer (GPT) family models ca...
Pretrained transformer models have demonstrated remarkable performance across various natural langua...