Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost, and long inference time make it challenging to deploy them to resource-constrained devices. Existing Transformer compression methods mainly focus on reducing the size of the encoder ignoring the fact that the decoder takes the major portion of the long inference time. In this paper, we propose PET (Parameter-Efficient knowledge distillation on Transformer), an efficient Transformer compression method that reduces the size of both the encoder an...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
Recently, large-scale transformer-based models have been proven to be effective over various tasks a...
Transformer models cannot easily scale to long sequences due to their O(N^2) time and space complexi...
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). Howeve...
We propose TCSP, a novel method for compressing a transformer model by focusing on reducing the hidd...
Despite the outstanding performances of the large Transformer-based language models, it proposes a c...
More transformer blocks with residual connections have recently achieved impressive results on vario...
Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art results on a ...
Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art results on a ...
In the last decade, the size of deep neural architectures implied in Natural Language Processing (NL...
The large attention-based encoder-decoder network (Transformer) has become prevailing recently due t...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Given a large Transformer model, how can we obtain a small and computationally efficient model which...
Recently, large-scale transformer-based models have been proven to be effective over various tasks a...
Transformer models cannot easily scale to long sequences due to their O(N^2) time and space complexi...
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). Howeve...
We propose TCSP, a novel method for compressing a transformer model by focusing on reducing the hidd...
Despite the outstanding performances of the large Transformer-based language models, it proposes a c...
More transformer blocks with residual connections have recently achieved impressive results on vario...
Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art results on a ...
Transformer-based sequence-to-sequence architectures, while achieving state-of-the-art results on a ...
In the last decade, the size of deep neural architectures implied in Natural Language Processing (NL...
The large attention-based encoder-decoder network (Transformer) has become prevailing recently due t...
Transformer-based neural models are used in many AI applications. Training these models is expensive...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...