This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequence model (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers. In addition to confirming the general mantra “more data and larger models”, we address scaling to multiple GPUs and provide practical tips for improved training regarding batch size, learning rate, warmup steps, maximum sentence length and checkpoint averaging. We hope that our observations will allow others to get better results given their partic...
Improving machine translation (MT) systems with translation memories (TMs) is of great interest to p...
Recent progress in neural machine translation is directed towards larger neural networks trained on ...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
This article describes our experiments in neural machine translation using the recent Tensor2Tensor ...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
Humans benefit from communication but suffer from language barriers. Machine translation (MT) aims t...
This paper presents the results of the WMT17 Neural MT Training Task. The objective of this task is...
Neural machine translation (NMT) has been a mainstream method for the machine translation (MT) task....
The Transformer translation model (Vaswani et al., 2017), which relies on selfattention mechanisms, ...
We study the role of an essential hyperparameter that governs the training of Transformers for neura...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Improving machine translation (MT) systems with translation memories (TMs) is of great interest to p...
Recent progress in neural machine translation is directed towards larger neural networks trained on ...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
This article describes our experiments in neural machine translation using the recent Tensor2Tensor ...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
With the recent developments in the field of Natural Language Processing, there has been a rise in t...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
Humans benefit from communication but suffer from language barriers. Machine translation (MT) aims t...
This paper presents the results of the WMT17 Neural MT Training Task. The objective of this task is...
Neural machine translation (NMT) has been a mainstream method for the machine translation (MT) task....
The Transformer translation model (Vaswani et al., 2017), which relies on selfattention mechanisms, ...
We study the role of an essential hyperparameter that governs the training of Transformers for neura...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Differently from the traditional statistical MT that decomposes the translation task into distinct s...
Improving machine translation (MT) systems with translation memories (TMs) is of great interest to p...
Recent progress in neural machine translation is directed towards larger neural networks trained on ...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...