OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

Hu, Peng
Peng, Xi
Zhu, Hongyuan
Aly, Mohamed M. Sabry
Lin, Jie

Open link

Publication date

May 2021

DOI

10.1609/aaai.v35i9.16950

Publisher

Association for the Advancement of Artificial Intelligence

Abstract

As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the key is to find suitable compression allocation (e.g., pruning sparsity and quantization codebook) of each layer. Existing solutions obtain the compression allocation in an iterative/manual fashion while finetuning the compressed model, thus suffering from the efficiency issue. Different from the prior art, we propose a novel One-shot Pruning-Quantization (OPQ) in this paper, which analytically s...

Extracted data

We use cookies to provide a better user experience.

Data Protection

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

Abstract

Extracted data

OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization

Abstract

Extracted data

Related items

Related items