Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that most attention heads learn simple, and often redundant, positional patterns. In this paper, we propose to replace all but one attention head of each encoder layer with simple fixed -- non-learnable -- attentive patterns that are solely based on position and do not require any external knowledge. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at train...
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to th...
Humans benefit from communication but suffer from language barriers. Machine translation (MT) aims t...
The powerful modeling capabilities of all-attention-based transformer architectures often cause over...
Neural machine translation has been lately established as the new state of the art in machine transl...
We explore the suitability of self-attention models for character-level neural machine translation. ...
Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for...
Machine translation has received significant attention in the field of natural language processing n...
Transformer is a neural machine translation model which revolutionizes machine translation. Compared...
The utility of linguistic annotation in neural machine translation seemed to had been established in...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
International audienceRecent studies on the analysis of the multilingual representations focus on id...
Automatic post-editing (APE) is the study of correcting translation errors in the output of an unkno...
Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtain...
An attentional mechanism has lately been used to improve neural machine transla-tion (NMT) by select...
The integration of syntactic structures into Transformer machine translation has shown positive resu...
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to th...
Humans benefit from communication but suffer from language barriers. Machine translation (MT) aims t...
The powerful modeling capabilities of all-attention-based transformer architectures often cause over...
Neural machine translation has been lately established as the new state of the art in machine transl...
We explore the suitability of self-attention models for character-level neural machine translation. ...
Multi-head self-attention is a key component of the Transformer, a state-of-the-art architecture for...
Machine translation has received significant attention in the field of natural language processing n...
Transformer is a neural machine translation model which revolutionizes machine translation. Compared...
The utility of linguistic annotation in neural machine translation seemed to had been established in...
The Transformer model is a very recent, fast and powerful discovery in neural machine translation. W...
International audienceRecent studies on the analysis of the multilingual representations focus on id...
Automatic post-editing (APE) is the study of correcting translation errors in the output of an unkno...
Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtain...
An attentional mechanism has lately been used to improve neural machine transla-tion (NMT) by select...
The integration of syntactic structures into Transformer machine translation has shown positive resu...
Though early successes of Statistical Machine Translation (SMT) systems are attributed in part to th...
Humans benefit from communication but suffer from language barriers. Machine translation (MT) aims t...
The powerful modeling capabilities of all-attention-based transformer architectures often cause over...