Various studies show that pretrained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks. This is even more astonishing considering the similarities between the architectures. This paper sheds some light on the embedding spaces they create, using average cosine similarity, contextuality metrics and measures for representational similarity for comparison, revealing that BERT and NMT encoder representations look significantly different from one another. In order to address this issue, we propose a supervised transformation from one into the other using explicit alignment and fine-tuning. Our results demonstrate the need for such a transformation to ...
BERTScore (Zhang et al., 2020), a recently proposed automatic metric for machine translation quality...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to e...
Various studies show that pretrained language models such as BERT cannot straightforwardly replace e...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...
Recently, impressive performance on various natural language understanding tasks has been achieved b...
Since the advent of automatic evaluation, tasks within Natural Language Processing (NLP), including ...
We address the task of automatically distinguishing between human-translated (HT) and machine transl...
Neural machine translation (NMT) is often described as ‘data hungry’ as it typically requires large ...
This work proposes an extensive analysis of the Transformer architecture in the Neural Machine Trans...
In this paper, we investigate how different aspects of discourse context affect the performance of r...
Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two comp...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be train...
BERTScore (Zhang et al., 2020), a recently proposed automatic metric for machine translation quality...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to e...
Various studies show that pretrained language models such as BERT cannot straightforwardly replace e...
Can we utilize extremely large monolingual text to improve neural machine translation without the ex...
GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various n...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...
Recently, impressive performance on various natural language understanding tasks has been achieved b...
Since the advent of automatic evaluation, tasks within Natural Language Processing (NLP), including ...
We address the task of automatically distinguishing between human-translated (HT) and machine transl...
Neural machine translation (NMT) is often described as ‘data hungry’ as it typically requires large ...
This work proposes an extensive analysis of the Transformer architecture in the Neural Machine Trans...
In this paper, we investigate how different aspects of discourse context affect the performance of r...
Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two comp...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be train...
BERTScore (Zhang et al., 2020), a recently proposed automatic metric for machine translation quality...
Recently, the development of pre-trained language models has brought natural language processing (NL...
Recent advances in End-to-End (E2E) Spoken Language Understanding (SLU) have been primarily due to e...