International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot crosslingual transfer capabilities. Such transfer emerges by fine-tuning on a task of interest in one language and evaluating on a distinct language, not seen during the fine-tuning. Despite promising results, we still lack a proper understanding of the source of this transfer. Using a novel layer ablation technique and analyses of the model's internal representations, we show that multilingual BERT, a popular multilingual language model, can be viewed as the stacking of two sub-networks: a multilingual encoder followed by a taskspecific language-agnostic predictor. While the encoder is crucial for cross-lingual transfer and remains mostly u...
Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipelin...
When humans read a text, their eye movements are influenced by the structural complexity of the inpu...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...
Accepted at EACL 2021Multilingual pretrained language models have demonstrated remarkable zero-shot ...
International audienceKnowledge transfer between neural language models is a widely used technique t...
While recent work on multilingual language models has demonstrated their capacity for cross-lingual ...
Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two comp...
NLP systems typically require support for more than one language. As different languages have differ...
Multilingual pre-trained language models perform remarkably well on cross-lingual transfer for downs...
For many (minority) languages, the resources needed to train large models are not available. We inve...
It has been shown that multilingual BERT (mBERT) yields high quality multilingual rep- resentations ...
Recent research has shown promise in multilingual modeling, demonstrating how a single model is capa...
International audienceSome Transformer-based models can perform crosslingual transfer learning: thos...
We analyze if large language models are able to predict patterns of human reading behavior. We compa...
We analyze if large language models are able to predict patterns of human reading behavior. We compa...
Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipelin...
When humans read a text, their eye movements are influenced by the structural complexity of the inpu...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...
Accepted at EACL 2021Multilingual pretrained language models have demonstrated remarkable zero-shot ...
International audienceKnowledge transfer between neural language models is a widely used technique t...
While recent work on multilingual language models has demonstrated their capacity for cross-lingual ...
Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two comp...
NLP systems typically require support for more than one language. As different languages have differ...
Multilingual pre-trained language models perform remarkably well on cross-lingual transfer for downs...
For many (minority) languages, the resources needed to train large models are not available. We inve...
It has been shown that multilingual BERT (mBERT) yields high quality multilingual rep- resentations ...
Recent research has shown promise in multilingual modeling, demonstrating how a single model is capa...
International audienceSome Transformer-based models can perform crosslingual transfer learning: thos...
We analyze if large language models are able to predict patterns of human reading behavior. We compa...
We analyze if large language models are able to predict patterns of human reading behavior. We compa...
Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipelin...
When humans read a text, their eye movements are influenced by the structural complexity of the inpu...
International audienceDeep learning models like BERT, a stack of attention layers with an unsupervis...