For many (minority) languages, the resources needed to train large models are not available. We investigate the performance of zero-shot transfer learning with as little data as possible, and the influence of language similarity in this process. We retrain the lexical layers of four BERT-based models using data from two low-resource target language varieties, while the Transformer layers are independently fine-tuned on a POS-tagging task in the model's source language. By combining the new lexical layers and fine-tuned Transformer layers, we achieve high task performance for both target languages. With high language similarity, 10MB of data appears sufficient to achieve substantial monolingual transfer performance. Monolingual BERT-based mo...
Transfer learning has led to large gains in performance for nearly all NLP tasks while making downst...
International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot...
Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipelin...
For many (minority) languages, the resources needed to train large models are not available. We inve...
For many (minority) languages, the resources needed to train large models are not available. We inve...
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems...
Multilingual language models are widely used to extend NLP systems to low-resource languages. Howeve...
Pretrained multilingual contextual representations have shown great success, but due to the limits o...
Some natural languages belong to the same family or share similar syntactic and/or semantic regulari...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
NLP technologies are uneven for the world's languages as the state-of-the-art models are only availa...
Some natural languages belong to the same family or share similar syntactic and/or semantic regulari...
Large pre-trained masked language models have become state-of-the-art solutions for many NLP problem...
Multilingual language models such as mBERT have seen impressive cross-lingual transfer to a variety ...
International audienceTransfer learning based on pretraining language models on a large amount of ra...
Transfer learning has led to large gains in performance for nearly all NLP tasks while making downst...
International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot...
Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipelin...
For many (minority) languages, the resources needed to train large models are not available. We inve...
For many (minority) languages, the resources needed to train large models are not available. We inve...
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems...
Multilingual language models are widely used to extend NLP systems to low-resource languages. Howeve...
Pretrained multilingual contextual representations have shown great success, but due to the limits o...
Some natural languages belong to the same family or share similar syntactic and/or semantic regulari...
Cross-lingual transfer learning with large multilingual pre-trained models can be an effective appro...
NLP technologies are uneven for the world's languages as the state-of-the-art models are only availa...
Some natural languages belong to the same family or share similar syntactic and/or semantic regulari...
Large pre-trained masked language models have become state-of-the-art solutions for many NLP problem...
Multilingual language models such as mBERT have seen impressive cross-lingual transfer to a variety ...
International audienceTransfer learning based on pretraining language models on a large amount of ra...
Transfer learning has led to large gains in performance for nearly all NLP tasks while making downst...
International audienceMultilingual pretrained language models have demonstrated remarkable zero-shot...
Peeking into the inner workings of BERT has shown that its layers resemble the classical NLP pipelin...