Structured Pruning Learns Compact and Accurate Models

Xia, Mengzhou
Zhong, Zexuan
Chen, Danqi

Publication date

May 2022

Language

English

Abstract

The growing size of neural language models has led to increased attention in model compression. The two predominant approaches are pruning, which gradually removes weights from a pre-trained model, and distillation, which trains a smaller compact model to match a larger one. Pruning methods can significantly reduce the model size but hardly achieve large speedups as distillation. However, distillation methods require large amounts of unlabeled data and are expensive to train. In this work, we propose a task-specific structured pruning method CoFi (Coarse- and Fine-grained Pruning), which delivers highly parallelizable subnetworks and matches the distillation methods in both accuracy and latency, without resorting to any unlabeled data. Our ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Structured Pruning Learns Compact and Accurate Models

Abstract

Extracted data

Structured Pruning Learns Compact and Accurate Models

Abstract

Extracted data

Related items

Related items