Non-autoregressive translation (NAT) models remove the dependence on previous target tokens and generate all target tokens in parallel, resulting in significant inference speedup but at the cost of inferior translation accuracy compared to autoregressive translation (AT) models. Considering that AT models have higher accuracy and are easier to train than NAT models, and both of them share the same model configurations, a natural idea to improve the accuracy of NAT models is to transfer a well-trained AT model to an NAT model through fine-tuning. However, since AT and NAT models differ greatly in training strategy, straightforward fine-tuning does not work well. In this work, we introduce curriculum learning into fine-tuning for NAT. Specifi...
Non-autoregressive neural machine translation (NAT) generates each target word in parallel and has a...
In recent years, a number of mehtods for improving the decoding speed of neural machine translation ...
How do we perform efficient inference while retaining high translation quality? Existing neural mach...
Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens f...
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem th...
Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through ge...
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to ...
Non-autoregressive approaches aim to improve the inference speed of translation models by only requi...
Neural machine translation (NMT) has become the de facto standard in the machine translation communi...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
The competitive performance of neural machine translation (NMT) critically relies on large amounts o...
Benefiting from the sequence-level knowledge distillation, the Non-Autoregressive Transformer (NAT) ...
Non-autoregressive neural machine translation (NAT) models are proposed to accelerate the inference ...
As a new neural machine translation approach, NonAutoregressive machine Translation (NAT) has attrac...
Non-autoregressive approaches aim to improve the inference speed of translation models, particularly...
Non-autoregressive neural machine translation (NAT) generates each target word in parallel and has a...
In recent years, a number of mehtods for improving the decoding speed of neural machine translation ...
How do we perform efficient inference while retaining high translation quality? Existing neural mach...
Non-autoregressive translation (NAT) models, which remove the dependence on previous target tokens f...
Non-autoregressive neural machine translation (NAT) models suffer from the multi-modality problem th...
Non-Autoregressive Neural Machine Translation (NAT) achieves significant decoding speedup through ge...
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation (NMT) to ...
Non-autoregressive approaches aim to improve the inference speed of translation models by only requi...
Neural machine translation (NMT) has become the de facto standard in the machine translation communi...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
The competitive performance of neural machine translation (NMT) critically relies on large amounts o...
Benefiting from the sequence-level knowledge distillation, the Non-Autoregressive Transformer (NAT) ...
Non-autoregressive neural machine translation (NAT) models are proposed to accelerate the inference ...
As a new neural machine translation approach, NonAutoregressive machine Translation (NAT) has attrac...
Non-autoregressive approaches aim to improve the inference speed of translation models, particularly...
Non-autoregressive neural machine translation (NAT) generates each target word in parallel and has a...
In recent years, a number of mehtods for improving the decoding speed of neural machine translation ...
How do we perform efficient inference while retaining high translation quality? Existing neural mach...