Towards Effective Utilization of Pretrained Language Models — Knowledge Distillation from BERT

Liu, Linqing

Publication date

August 2020

Publisher

University of Waterloo

Abstract

In the natural language processing (NLP) literature, neural networks are becoming increasingly deeper and more complex. Recent advancements in neural NLP are large pretrained language models (e.g. BERT), which lead to significant performance gains in various downstream tasks. Such models, however, require intensive computational resource to train and are difficult to deploy in practice due to poor inference-time efficiency. In this thesis, we are trying to solve this problem through knowledge distillation (KD), where a large pretrained model serves as teacher and transfers its knowledge to a small student model. We also want to demonstrate the competitiveness of small, shallow neural networks. We propose a simple yet effective approach tha...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Towards Effective Utilization of Pretrained Language Models — Knowledge Distillation from BERT

Abstract

Extracted data

Towards Effective Utilization of Pretrained Language Models — Knowledge Distillation from BERT

Abstract

Extracted data

Related items

Related items