Neural machine translation (NMT) systems have greatly improved the quality available from machine translation (MT) compared to statistical machine translation (SMT) systems. However, these state-of-the-art NMT models need much more computing power and data than SMT models, a requirement that is unsustainable in the long run and of very limited benefit in low-resource scenarios. To some extent, model compression—more specifically state-of-the-art knowledge distillation techniques—can remedy this. In this work, we investigate knowledge distillation on a simulated low-resource German-to-English translation task. We show that sequence-level knowledge distillation can be used to train small student models on knowledge distilled from large teache...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Mach...
Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Mach...
Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transfor...
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). Howeve...
Benefiting from the sequence-level knowledge distillation, the Non-Autoregressive Transformer (NAT) ...
Pre-training and fine-tuning have achieved great success in natural language process field. The stan...
Deep and large pre-trained language models (e.g., BERT, GPT-3) are state-of-the-art for various natu...
Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation b...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Neural machine translation (NMT) systems have greatly improved the quality available from machine tr...
Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Mach...
Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Mach...
Natural Language Processing (NLP) has seen tremendous improvements over the last few years. Transfor...
Recently, deep models have shown tremendous improvements in neural machine translation (NMT). Howeve...
Benefiting from the sequence-level knowledge distillation, the Non-Autoregressive Transformer (NAT) ...
Pre-training and fine-tuning have achieved great success in natural language process field. The stan...
Deep and large pre-trained language models (e.g., BERT, GPT-3) are state-of-the-art for various natu...
Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation b...
Direct speech-to-text translation (ST) is an emerging approach that consists in performing the ST ta...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...