International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot crosslingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model's internal representations, we show that multilingual BERT, a popular multilingual language model, can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a taskspecific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly u...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
Pre-trained multilingual models, such as mBERT, XLM-R and mT5, are used to improve the performance o...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...
International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot...
While recent work on multilingual language models has demonstrated their capacity for cross-lingual ...
It has been shown that multilingual BERT (mBERT) yields high quality multilingual rep- resentations ...
NLP systems typically require support for more than one language. As different languages have differ...
International audienceKnowledge transfer between neural language models is a widely used technique t...
peer reviewedIn recent years, pre-trained Multilingual Language Models (MLLMs) have shown a strong a...
For many (minority) languages, the resources needed to train large models are not available. We inve...
Pre-trained multilingual language models play an important role in cross-lingual natural language un...
Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two comp...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be train...
For many (minority) languages, the resources needed to train large models are not available. We inve...
International audienceIn this paper, we challenge a basic assumption of many cross-lingual transfer ...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
Pre-trained multilingual models, such as mBERT, XLM-R and mT5, are used to improve the performance o...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...
International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot...
While recent work on multilingual language models has demonstrated their capacity for cross-lingual ...
It has been shown that multilingual BERT (mBERT) yields high quality multilingual rep- resentations ...
NLP systems typically require support for more than one language. As different languages have differ...
International audienceKnowledge transfer between neural language models is a widely used technique t...
peer reviewedIn recent years, pre-trained Multilingual Language Models (MLLMs) have shown a strong a...
For many (minority) languages, the resources needed to train large models are not available. We inve...
Pre-trained multilingual language models play an important role in cross-lingual natural language un...
Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two comp...
Some Transformer-based models can perform cross-lingual transfer learning: those models can be train...
For many (minority) languages, the resources needed to train large models are not available. We inve...
International audienceIn this paper, we challenge a basic assumption of many cross-lingual transfer ...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
Pre-trained multilingual models, such as mBERT, XLM-R and mT5, are used to improve the performance o...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...