For multilingual sequence-to-sequence pretrained language models (multilingual Seq2Seq PLMs), e.g. mBART, the self-supervised pretraining task is trained on a wide range of monolingual languages, e.g. 25 languages from commoncrawl, while the downstream cross-lingual tasks generally progress on a bilingual language subset, e.g. English-German, making there exists the cross-lingual data discrepancy, namely \textit{domain discrepancy}, and cross-lingual learning objective discrepancy, namely \textit{task discrepancy}, between the pretrain and finetune stages. To bridge the above cross-lingual domain and task gaps, we extend the vanilla pretrain-finetune pipeline with extra code-switching restore task. Specifically, the first stage employs the ...
Cross-lingual models trained on source language tasks possess the capability to directly transfer to...
In cross-lingual language understanding, machine translation is often utilized to enhance the transf...
Recent research has shown promise in multilingual modeling, demonstrating how a single model is capa...
In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence language model pretrained wi...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
While pretrained language models (PLMs) primarily serve as general purpose text encoders that can be...
Recent progress in task-oriented neural dialogue systems is largely focused on a handful of language...
Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great...
Language model pre-training has achieved success in many natural language processing tasks. Existing...
International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be train...
Pre-trained multilingual language models play an important role in cross-lingual natural language un...
Word alignment which aims to extract lexicon translation equivalents between source and target sente...
Cross-lingual models trained on source language tasks possess the capability to directly transfer to...
In cross-lingual language understanding, machine translation is often utilized to enhance the transf...
Recent research has shown promise in multilingual modeling, demonstrating how a single model is capa...
In this paper, we introduce DOCmT5, a multilingual sequence-to-sequence language model pretrained wi...
Pre-trained multilingual language models show significant performance gains for zero-shot cross-ling...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NM...
While pretrained language models (PLMs) primarily serve as general purpose text encoders that can be...
Recent progress in task-oriented neural dialogue systems is largely focused on a handful of language...
Large-scale cross-lingual language models (LM), such as mBERT, Unicoder and XLM, have achieved great...
Language model pre-training has achieved success in many natural language processing tasks. Existing...
International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be train...
Pre-trained multilingual language models play an important role in cross-lingual natural language un...
Word alignment which aims to extract lexicon translation equivalents between source and target sente...
Cross-lingual models trained on source language tasks possess the capability to directly transfer to...
In cross-lingual language understanding, machine translation is often utilized to enhance the transf...
Recent research has shown promise in multilingual modeling, demonstrating how a single model is capa...