Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challenging task. In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. In particular, we propose a new group-wise quantization scheme, and we use Hessian-based mix-precision method to compress the mod...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Neural networks are powerful solutions to help with decision making and solve complex problems in r...
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has be...
© 2022 Piao et al. This is an open access article distributed under the terms of the Creative Common...
Transformer-based language models have become a key building block for natural language processing. ...
Pre-trained language models of the BERT family have defined the state-of-the-arts in a wide range of...
Large pre-trained language models have recently gained significant traction due to their improved pe...
This model is finetuned and quantized based on a pretrained huggingface BERT model. The quantizatio...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
This model is fine-tuned based on MLPerf Inference BERT PyTorch Model on SQuAD v1.1 dataset and conv...
Currently, the most widespread neural network architecture for training language models is the so-ca...
The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the deman...
As language models have grown in parameters and layers, it has become much harder to train and infer...
Pre-training complex language models is essential for the success of the recent methods such as BERT...
Transformer models perform well on Natural Language Processing and Natural Language Understanding ta...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Neural networks are powerful solutions to help with decision making and solve complex problems in r...
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has be...
© 2022 Piao et al. This is an open access article distributed under the terms of the Creative Common...
Transformer-based language models have become a key building block for natural language processing. ...
Pre-trained language models of the BERT family have defined the state-of-the-arts in a wide range of...
Large pre-trained language models have recently gained significant traction due to their improved pe...
This model is finetuned and quantized based on a pretrained huggingface BERT model. The quantizatio...
In this position statement, we wish to contribute to the discussion about how to assess quality and ...
This model is fine-tuned based on MLPerf Inference BERT PyTorch Model on SQuAD v1.1 dataset and conv...
Currently, the most widespread neural network architecture for training language models is the so-ca...
The increasing size of generative Pre-trained Language Models (PLMs) has greatly increased the deman...
As language models have grown in parameters and layers, it has become much harder to train and infer...
Pre-training complex language models is essential for the success of the recent methods such as BERT...
Transformer models perform well on Natural Language Processing and Natural Language Understanding ta...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Neural networks are powerful solutions to help with decision making and solve complex problems in r...
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has be...