Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2023.It was researched whether a multilingual Bantu pretraining corpus could be created from freely available data. Here, to create the dataset, Bantu text extracted from datasets that are freely available online (mainly from Huggingface) were used. The resulting multilingual language model (BantuBERTa) from this pretraining data proved to be predictive across multiple Bantu languages on a higher-order NLP task (NER) and in a simpler NLP task (classification). This proves that this dataset can be used for Bantu multilingual pretraining and transfer to multiple Bantu languages. Additionally, it was researched whether using this Bantu dataset could benefit transfer lea...
Language identification is an important pre-process in many data management and information retrieva...
Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, ...
International audienceMultilingual transformer models like mBERT and XLM-RoBERTa have obtained great...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
Thesis (MSc)--Stellenbosch University, 2021.ENGLISH ABSTRACT: The majority of African languages have...
The paper describes the University of Cape Town's submission to the constrained track of the WMT22 S...
The paper describes the University of Cape Town's submission to the constrained track of the WMT22 S...
Language models are the foundation of current neural network-based models for natural language under...
Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2022.South Africa has eleven off...
The Bantu family is the largest African language family in terms of geographic and demographic sprea...
The Bantu family is the largest African language family in terms of geographic and demographic sprea...
Abstract This paper describes an endeavour to build natural language processing (NLP)...
Over the past five years neural network models have been successful across a range of computational ...
Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several ...
Language identification is an important pre-process in many data management and information retrieva...
Language identification is an important pre-process in many data management and information retrieva...
Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, ...
International audienceMultilingual transformer models like mBERT and XLM-RoBERTa have obtained great...
There are over 7000 languages spoken on earth, but many of these languages suffer from a dearth of n...
Thesis (MSc)--Stellenbosch University, 2021.ENGLISH ABSTRACT: The majority of African languages have...
The paper describes the University of Cape Town's submission to the constrained track of the WMT22 S...
The paper describes the University of Cape Town's submission to the constrained track of the WMT22 S...
Language models are the foundation of current neural network-based models for natural language under...
Mini Dissertation (MIT (Big Data Science))--University of Pretoria, 2022.South Africa has eleven off...
The Bantu family is the largest African language family in terms of geographic and demographic sprea...
The Bantu family is the largest African language family in terms of geographic and demographic sprea...
Abstract This paper describes an endeavour to build natural language processing (NLP)...
Over the past five years neural network models have been successful across a range of computational ...
Multilingual pre-trained language models (PLMs) have demonstrated impressive performance on several ...
Language identification is an important pre-process in many data management and information retrieva...
Language identification is an important pre-process in many data management and information retrieva...
Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, ...
International audienceMultilingual transformer models like mBERT and XLM-RoBERTa have obtained great...