This work proposes an extensive analysis of the Transformer architecture in the Neural Machine Translation (NMT) setting. Focusing on the encoder-decoder attention mechanism, we prove that attention weights systematically make alignment errors by relying mainly on uninformative tokens from the source sequence. However, we observe that NMT models assign attention to these tokens to regulate the contribution in the prediction of the two contexts, the source and the prefix of the target sequence. We provide evidence about the influence of wrong alignments on the model behavior, demonstrating that the encoder-decoder attention mechanism is well suited as an interpretability method for NMT. Finally, based on our analysis, we propose methods that...
Can we trust that the attention heatmaps produced by a neural machine translation (NMT) model reflec...
Transformer-based models have brought a radical change to neural machine translation. A key feature ...
Language Generation Models produce words based on the previous context. Although existing methods of...
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and...
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and...
The attention mechanism in Neural Machine Translation (NMT) models added flexibility to translation ...
The encoder-decoder with attention model has become the state of the art for machine translation. Ho...
In this thesis, I explore neural machine translation (NMT) models via targeted investigation of vari...
Transformer is a neural machine translation model which revolutionizes machine translation. Compared...
Neural machine translation (NMT) has achieved new state-of-the-art performance in translating ambigu...
Lexically constrained neural machine translation (NMT), which leverages pre-specified translation to...
Machine translation, the task of automatically translating text from one natural language into anoth...
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to th...
Attention-based autoregressive models have achieved state-of-the-art performance in various sequence...
In Neural Machine Translation (and, more generally, conditional language modeling), the generation o...
Can we trust that the attention heatmaps produced by a neural machine translation (NMT) model reflec...
Transformer-based models have brought a radical change to neural machine translation. A key feature ...
Language Generation Models produce words based on the previous context. Although existing methods of...
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and...
In Neural Machine Translation (NMT), each token prediction is conditioned on the source sentence and...
The attention mechanism in Neural Machine Translation (NMT) models added flexibility to translation ...
The encoder-decoder with attention model has become the state of the art for machine translation. Ho...
In this thesis, I explore neural machine translation (NMT) models via targeted investigation of vari...
Transformer is a neural machine translation model which revolutionizes machine translation. Compared...
Neural machine translation (NMT) has achieved new state-of-the-art performance in translating ambigu...
Lexically constrained neural machine translation (NMT), which leverages pre-specified translation to...
Machine translation, the task of automatically translating text from one natural language into anoth...
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to th...
Attention-based autoregressive models have achieved state-of-the-art performance in various sequence...
In Neural Machine Translation (and, more generally, conditional language modeling), the generation o...
Can we trust that the attention heatmaps produced by a neural machine translation (NMT) model reflec...
Transformer-based models have brought a radical change to neural machine translation. A key feature ...
Language Generation Models produce words based on the previous context. Although existing methods of...