GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework (CTNMT) that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed CTNMT consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; b) a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and c) a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation sh...
Pre-trained models have revolutionized the natural language processing field by leveraging large-sca...
Neural machine translation (NMT) is often described as ‘data hungry’ as it typically requires large ...
Various studies show that pretrained language models such as BERT cannot straightforwardly replace e...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Pre-training and fine-tuning have achieved great success in natural language process field. The stan...
Pre-training and fine-tuning have become the de facto paradigm in many natural language processing (...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Neural machine translation (NMT), where neural networks are used to generate translations, has revol...
Recently, the development of pre-trained language models has brought natural language processing (NL...
With the advent of deep neural networks in recent years, Neural Machine Translation (NMT) systems ha...
Recent literature has demonstrated the potential of multilingual Neural Machine Translation (mNMT) m...
Neural machine translation (NMT) has achieved notable success in recent times, however it is also wi...
Pre-trained models have revolutionized the natural language processing field by leveraging large-sca...
Neural machine translation (NMT) is often described as ‘data hungry’ as it typically requires large ...
Various studies show that pretrained language models such as BERT cannot straightforwardly replace e...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
Pre-training and fine-tuning have achieved great success in natural language process field. The stan...
Pre-training and fine-tuning have become the de facto paradigm in many natural language processing (...
This paper considers continual learning of large-scale pretrained neural machine translation model w...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Transfer learning is to apply knowledge or patterns learned in a particular field or task to differe...
Neural machine translation (NMT), where neural networks are used to generate translations, has revol...
Recently, the development of pre-trained language models has brought natural language processing (NL...
With the advent of deep neural networks in recent years, Neural Machine Translation (NMT) systems ha...
Recent literature has demonstrated the potential of multilingual Neural Machine Translation (mNMT) m...
Neural machine translation (NMT) has achieved notable success in recent times, however it is also wi...
Pre-trained models have revolutionized the natural language processing field by leveraging large-sca...
Neural machine translation (NMT) is often described as ‘data hungry’ as it typically requires large ...
Various studies show that pretrained language models such as BERT cannot straightforwardly replace e...