In the natural language processing (NLP) literature, neural networks are becoming increasingly deeper and more complex. Recent advancements in neural NLP are large pretrained language models (e.g. BERT), which lead to significant performance gains in various downstream tasks. Such models, however, require intensive computational resource to train and are difficult to deploy in practice due to poor inference-time efficiency. In this thesis, we are trying to solve this problem through knowledge distillation (KD), where a large pretrained model serves as teacher and transfers its knowledge to a small student model. We also want to demonstrate the competitiveness of small, shallow neural networks. We propose a simple yet effective approach tha...
Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in m...
Substantial progress has been made in the field of natural language processing (NLP) due to the adve...
Large-scale pretrained language models have led to significant improvements in Natural Language Proc...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
Building a Neural Language Model from scratch involves a big number of different design decisions. Y...
Building a Neural Language Model from scratch involves a big number of different design decisions. Y...
Despite pre-trained language models such as BERT have achieved appealing performance in a wide range...
One of the main problems in the field of Artificial Intelligence is the efficiency of neural network...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Deep and large pre-trained language models (e.g., BERT, GPT-3) are state-of-the-art for various natu...
Knowledge Distillation (KD) consists of transferring “knowledge” from one machine learning model (th...
Knowledge Distillation (KD) consists of transferring “knowledge” from one machine learning model (th...
Recent advances with large-scale pre-trained language models (e.g., BERT) have brought significant p...
Large pretrained language models have achieved state-of-the-art results on a variety of downstream t...
In recent years, deep neural networks have been successful in both industry and academia, especially...
Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in m...
Substantial progress has been made in the field of natural language processing (NLP) due to the adve...
Large-scale pretrained language models have led to significant improvements in Natural Language Proc...
Since the first bidirectional deep learn- ing model for natural language understanding, BERT, emerge...
Building a Neural Language Model from scratch involves a big number of different design decisions. Y...
Building a Neural Language Model from scratch involves a big number of different design decisions. Y...
Despite pre-trained language models such as BERT have achieved appealing performance in a wide range...
One of the main problems in the field of Artificial Intelligence is the efficiency of neural network...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Deep and large pre-trained language models (e.g., BERT, GPT-3) are state-of-the-art for various natu...
Knowledge Distillation (KD) consists of transferring “knowledge” from one machine learning model (th...
Knowledge Distillation (KD) consists of transferring “knowledge” from one machine learning model (th...
Recent advances with large-scale pre-trained language models (e.g., BERT) have brought significant p...
Large pretrained language models have achieved state-of-the-art results on a variety of downstream t...
In recent years, deep neural networks have been successful in both industry and academia, especially...
Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in m...
Substantial progress has been made in the field of natural language processing (NLP) due to the adve...
Large-scale pretrained language models have led to significant improvements in Natural Language Proc...