International audienceMultilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages. However, recent works also showed that results from high-resource languages could not be easily transferred to realistic, low-resource scenarios. In this work, we study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and Yorùbá on both NER and topic classification. We show that in combination with transfer learning or distant supervision, these models can achieve with as little as 10 or 100 labeled sentences the same performance as baselines with much more supervised training data. However, we also find settings where...
While building automatic speech recognition (ASR) requires a large amount of speech and text data, t...
The paper demonstrates the feasibility and scalability of participatory research, with a case study ...
We present the results of the WMT'22 Shared Task on Large-Scale Machine Translation Evaluation for A...
Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several ...
Thesis (MSc)--Stellenbosch University, 2021.ENGLISH ABSTRACT: The majority of African languages have...
African languages are spoken by over a billion people, but are underrepresented in NLP research and ...
Scaling multilingual representation learning beyond the hundred most frequent languages is challengi...
The paper describes the University of Cape Town's submission to the constrained track of the WMT22 S...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
African languages are spoken by over a billion people, but are underrepresented in NLP research and ...
Language models are the foundation of current neural network-based models for natural language under...
Transfer learning has led to large gains in performance for nearly all NLP tasks while making downst...
Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognit...
For many (minority) languages, the resources needed to train large models are not available. We inve...
This paper investigates the potential of improving a hybrid automatic speech recognition model train...
While building automatic speech recognition (ASR) requires a large amount of speech and text data, t...
The paper demonstrates the feasibility and scalability of participatory research, with a case study ...
We present the results of the WMT'22 Shared Task on Large-Scale Machine Translation Evaluation for A...
Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several ...
Thesis (MSc)--Stellenbosch University, 2021.ENGLISH ABSTRACT: The majority of African languages have...
African languages are spoken by over a billion people, but are underrepresented in NLP research and ...
Scaling multilingual representation learning beyond the hundred most frequent languages is challengi...
The paper describes the University of Cape Town's submission to the constrained track of the WMT22 S...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
African languages are spoken by over a billion people, but are underrepresented in NLP research and ...
Language models are the foundation of current neural network-based models for natural language under...
Transfer learning has led to large gains in performance for nearly all NLP tasks while making downst...
Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognit...
For many (minority) languages, the resources needed to train large models are not available. We inve...
This paper investigates the potential of improving a hybrid automatic speech recognition model train...
While building automatic speech recognition (ASR) requires a large amount of speech and text data, t...
The paper demonstrates the feasibility and scalability of participatory research, with a case study ...
We present the results of the WMT'22 Shared Task on Large-Scale Machine Translation Evaluation for A...