SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression

Piao, Tairen
Cho, Ikhyun
Kang, U

Open link

Publication date

April 2022

Publisher

Public Library of Science (PLoS)

Journal

PLoS ONE

Abstract

© 2022 Piao et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Given a pre-trained BERT, how can we compress it to a fast and lightweight one while maintaining its accuracy? Pre-training language model, such as BERT, is effective for improving the performance of natural language processing (NLP) tasks. However, heavy models like BERT have problems of large memory cost and long inference time. In this paper, we propose SENSIMIX (Sensitivity-Aware Mixed Precision Quantization), a novel quantizationbased BERT compression method that considers the sensitivi...

Extracted data

We use cookies to provide a better user experience.

Data Protection

SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression

Abstract

Extracted data

SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression

Abstract

Extracted data

Related items

Related items

SensiMix: Sensitivity-Aware 8-bit index &amp; 1-bit value mixed precision quantization for BERT compression

Abstract

Extracted data

SensiMix: Sensitivity-Aware 8-bit index &amp; 1-bit value mixed precision quantization for BERT compression

Abstract

Extracted data

Related items

Related items

SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression

SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression