Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distinctions between words that are important for both meaning and pronunciation, and hence of potential value for a range of language processing tasks. Such diacritics, however, are often largely absent from the electronic texts we might want to process, or assemble into corpora, and so the need arises for effective methods for automatic diacritic restoration for Igbo. In this paper, we experiment using an Igbo bible corpus, which is extensively marked for vowel distinctions, and partially for tonal distinctions, and attempt the task of reinstating these diacritics when they have been deleted. We investigate a number of word-level diacritic restor...
Part-of-speech (POS) tagging is a well-established technology for most Western European languages an...
Statistical language models are utilized in many speech processing algorithms, e.g., automatic speec...
Computational studies of Igbo language are constrained by non-availability of large electronic corp...
Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distin...
Properly written texts in Igbo, a low resource African language, are rich in both orthographic and...
Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the nativ...
With natural language processing (NLP), researchers aim to get the computer to identify and understa...
The orthography of many resource-scarce languages includes diacritically marked characters. Falling ...
NLP research on low resource African languages is often impeded by the unavailability of basic resou...
Natural Language Processing (NLP) research is still in its infancy in Africa. Most of languages in ...
Abstract. The orthography of many resource-scarce languages includes diacritically marked characters...
This project aims to develop linguistic resources to support computational NLP research on the Igbo...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
The accuracy of an annotated corpus can be increased through evaluation and re- vision of the annota...
Part-of-speech (POS) tagging is a well-established technology for most Western European languages an...
Statistical language models are utilized in many speech processing algorithms, e.g., automatic speec...
Computational studies of Igbo language are constrained by non-availability of large electronic corp...
Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distin...
Properly written texts in Igbo, a low resource African language, are rich in both orthographic and...
Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the nativ...
With natural language processing (NLP), researchers aim to get the computer to identify and understa...
The orthography of many resource-scarce languages includes diacritically marked characters. Falling ...
NLP research on low resource African languages is often impeded by the unavailability of basic resou...
Natural Language Processing (NLP) research is still in its infancy in Africa. Most of languages in ...
Abstract. The orthography of many resource-scarce languages includes diacritically marked characters...
This project aims to develop linguistic resources to support computational NLP research on the Igbo...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
The accuracy of an annotated corpus can be increased through evaluation and re- vision of the annota...
Part-of-speech (POS) tagging is a well-established technology for most Western European languages an...
Statistical language models are utilized in many speech processing algorithms, e.g., automatic speec...
Computational studies of Igbo language are constrained by non-availability of large electronic corp...