Multilingual language models such as mBERT have seen impressive cross-lingual transfer to a variety of languages, but many languages remain excluded from these models. In this paper, we analyse the effect of pre-training with monolingual data for a low-resource language that is not included in mBERT – Maltese – with a range of pre-training set ups. We conduct evaluations with the newly pre-trained models on three morphosyntactic tasks – dependency parsing, part-of-speech tagging, and named-entity recognition – and one semantic classification task – sentiment analysis. We also present a newly created corpus for Maltese, and determine the effect that the pre-training data size and domain have on the downstream performance. Our results show th...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
The recently developed state-of-the-art models for Named Entity Recognition are heavily dependent up...
For many (minority) languages, the resources needed to train large models are not available. We inve...
Multilingual language models such as mBERT have seen impressive cross-lingual transfer to a variety ...
Pretrained multilingual contextual representations have shown great success, but due to the limits o...
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems...
Recently, it has been found that monolingual English language models can be used as knowledge bases....
For many (minority) languages, the resources needed to train large models are not available. We inve...
Large pretrained masked language models have become state-of-theart solutions for many NLP problems....
International audienceTransfer learning based on pretraining language models on a large amount of ra...
Multilingual language models are widely used to extend NLP systems to low-resource languages. Howeve...
The scarcity of parallel data is a major limitation for Neural Machine Translation (NMT) systems, in...
Large pre-trained masked language models have become state-of-the-art solutions for many NLP problem...
peer reviewedDespite the widespread use of pre-trained models in NLP, well-performing pre-trained mo...
Deep neural language models such as BERT have enabled substantial recent advances in many natural la...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
The recently developed state-of-the-art models for Named Entity Recognition are heavily dependent up...
For many (minority) languages, the resources needed to train large models are not available. We inve...
Multilingual language models such as mBERT have seen impressive cross-lingual transfer to a variety ...
Pretrained multilingual contextual representations have shown great success, but due to the limits o...
Large pretrained masked language models have become state-of-the-art solutions for many NLP problems...
Recently, it has been found that monolingual English language models can be used as knowledge bases....
For many (minority) languages, the resources needed to train large models are not available. We inve...
Large pretrained masked language models have become state-of-theart solutions for many NLP problems....
International audienceTransfer learning based on pretraining language models on a large amount of ra...
Multilingual language models are widely used to extend NLP systems to low-resource languages. Howeve...
The scarcity of parallel data is a major limitation for Neural Machine Translation (NMT) systems, in...
Large pre-trained masked language models have become state-of-the-art solutions for many NLP problem...
peer reviewedDespite the widespread use of pre-trained models in NLP, well-performing pre-trained mo...
Deep neural language models such as BERT have enabled substantial recent advances in many natural la...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
The recently developed state-of-the-art models for Named Entity Recognition are heavily dependent up...
For many (minority) languages, the resources needed to train large models are not available. We inve...