Properly written texts in Igbo, a low resource African language, are rich in both orthographic and tonal diacritics. Diacritics are essential in capturing the distinctions in pronunciation and meaning of words, as well as in lexical disambiguation. Unfortunately, most electronic texts in diacritic languages are written without diacritics. This makes diacritic restoration a necessary step in corpus building and language processing tasks for languages with diacritics. In our previous work, we built some n−gram models with simple smoothing techniques based on a closedworld assumption. However, as a classi- fication task, diacritic restoration is well suited for and will be more generalisable with machine learning. This paper, ...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
International audienceIn this paper we present a statistical approach for automatic diacritization o...
Existing NLP models are mostly trained with data from well-resourced languages. Most minority langua...
Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distin...
With natural language processing (NLP), researchers aim to get the computer to identify and understa...
Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the nativ...
The orthography of many resource-scarce languages includes diacritically marked characters. Falling ...
NLP research on low resource African languages is often impeded by the unavailability of basic resou...
Abstract. The orthography of many resource-scarce languages includes diacritically marked characters...
Natural Language Processing (NLP) research is still in its infancy in Africa. Most of languages in ...
This project aims to develop linguistic resources to support computational NLP research on the Igbo...
Scanning through bookshelves in university/college libraries or bookstands in bookshops for written ...
Computational studies of Igbo language are constrained by non-availability of large electronic corp...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
Ungrammaticality is a phenomenon that is not associated with the use of the mother tongue. This is b...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
International audienceIn this paper we present a statistical approach for automatic diacritization o...
Existing NLP models are mostly trained with data from well-resourced languages. Most minority langua...
Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distin...
With natural language processing (NLP), researchers aim to get the computer to identify and understa...
Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the nativ...
The orthography of many resource-scarce languages includes diacritically marked characters. Falling ...
NLP research on low resource African languages is often impeded by the unavailability of basic resou...
Abstract. The orthography of many resource-scarce languages includes diacritically marked characters...
Natural Language Processing (NLP) research is still in its infancy in Africa. Most of languages in ...
This project aims to develop linguistic resources to support computational NLP research on the Igbo...
Scanning through bookshelves in university/college libraries or bookstands in bookshops for written ...
Computational studies of Igbo language are constrained by non-availability of large electronic corp...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
Ungrammaticality is a phenomenon that is not associated with the use of the mother tongue. This is b...
Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that uti...
International audienceIn this paper we present a statistical approach for automatic diacritization o...
Existing NLP models are mostly trained with data from well-resourced languages. Most minority langua...