peer reviewedPre-trained Language Models such as BERT have become ubiquitous in NLP where they have achieved state-of-the-art performance in most NLP tasks. While these models are readily available for English and other widely spoken languages, they remain scarce for low-resource languages such as Luxembourgish. In this paper, we present LuxemBERT, a BERT model for the Luxembourgish language that we create using the following approach: we augment the pre-training dataset by considering text data from a closely related language that we partially translate using a simple and straightforward method. We are then able to produce the LuxemBERT model, which we show to be effective for various NLP tasks: it outperforms a simple baseline built with ...
Despite some recent work, the ongoing research for the processing of Luxembourgish is still largely ...
We have developed an automatic speech recognition (ASR) system tailored to Luxembourgish, a low-reso...
This article is a report about compiling a corpus of Luxembourgish for investigation of word formati...
peer reviewedPre-trained Language Models such as BERT have become ubiquitous in NLP where they have ...
Despite the widespread use of pre-trained models in NLP, well-performing pre-trained models for low-...
Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data....
Pre-trained language models have been dominating the field of natural language processing in recent ...
Large pretrained masked language models have become state-of-theart solutions for many NLP problems....
The Grand Duchy of Luxembourg is a small country in Western Europe, which, despite its size, is an i...
Web site: https://camembert-model.frPretrained language models are now ubiquitous in Natural Languag...
Large pre-trained masked language models have become state-of-the-art solutions for many NLP problem...
<p>Recent advances in NLP have significantly improved the performance of language models on a ...
Neural machine translation (NMT) is often described as ‘data hungry’ as it typically requires large ...
International audienceLanguage models have become a key step to achieve state-of-the art results in ...
International audienceLuxembourgish is embedded in a multilingual context on the divide between Roma...
Despite some recent work, the ongoing research for the processing of Luxembourgish is still largely ...
We have developed an automatic speech recognition (ASR) system tailored to Luxembourgish, a low-reso...
This article is a report about compiling a corpus of Luxembourgish for investigation of word formati...
peer reviewedPre-trained Language Models such as BERT have become ubiquitous in NLP where they have ...
Despite the widespread use of pre-trained models in NLP, well-performing pre-trained models for low-...
Natural language processing of Low-Resource Languages (LRL) is often challenged by the lack of data....
Pre-trained language models have been dominating the field of natural language processing in recent ...
Large pretrained masked language models have become state-of-theart solutions for many NLP problems....
The Grand Duchy of Luxembourg is a small country in Western Europe, which, despite its size, is an i...
Web site: https://camembert-model.frPretrained language models are now ubiquitous in Natural Languag...
Large pre-trained masked language models have become state-of-the-art solutions for many NLP problem...
<p>Recent advances in NLP have significantly improved the performance of language models on a ...
Neural machine translation (NMT) is often described as ‘data hungry’ as it typically requires large ...
International audienceLanguage models have become a key step to achieve state-of-the art results in ...
International audienceLuxembourgish is embedded in a multilingual context on the divide between Roma...
Despite some recent work, the ongoing research for the processing of Luxembourgish is still largely ...
We have developed an automatic speech recognition (ASR) system tailored to Luxembourgish, a low-reso...
This article is a report about compiling a corpus of Luxembourgish for investigation of word formati...