A Fast Post-Training Pruning Framework for Transformers

Kwon, Woosuk
Kim, Sehoon
Mahoney, Michael W.
Hassoun, Joseph
Keutzer, Kurt
Gholami, Amir

Publication date

October 2022

Language

English

Abstract

Pruning is an effective way to reduce the huge inference cost of Transformer models. However, prior work on pruning Transformers requires retraining the models. This can add high training cost and high complexity to model deployment, making it difficult to use in many practical situations. To address this, we propose a fast post-training pruning framework for Transformers that does not require any retraining. Given a resource constraint and a sample dataset, our framework automatically prunes the Transformer model using structured sparsity methods. To retain high accuracy without retraining, we introduce three novel techniques: (i) a lightweight mask search algorithm that finds which heads and filters to prune based on the Fisher informatio...

Extracted data

We use cookies to provide a better user experience.

Data Protection

A Fast Post-Training Pruning Framework for Transformers

Abstract

Extracted data

A Fast Post-Training Pruning Framework for Transformers

Abstract

Extracted data

Related items

Related items