The orthography of many resource-scarce languages includes diacritically marked characters. Falling outside the scope of the standard Latin encoding, these characters are often represented in digital language resources as their unmarked equivalents. This renders corpus compilation more difficult, as these languages typically do not have the benefit of large electronic dictionaries to perform diacritic restoration. This paper describes experiments with a machine learning approach that is able to automatically restore diacritics on the basis of local graphemic context. We apply the method to the African languages of Ciluba, Gikuyu, Kikamba, Maa, Sesotho sa Leboa, Tshivenda and Yoruba and contrast it with experiments on Czech, Dutch, French, G...
In this paper, we describe a method based on statistical machine translation (SMT) that is able to r...
Much of the text data that exists in many languages is locked away in nondigitized books and documen...
Despite the modern boom in technology, we are still faced with the fact that people write texts with...
The orthography of many resource-scarce languages includes diacritically marked characters. Falling ...
Abstract. The orthography of many resource-scarce languages includes diacritically marked characters...
Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distin...
Statistical language models are utilized in many speech processing algorithms, e.g., automatic speec...
Corpus of texts in 12 languages. For each language, we provide one training, one development and one...
Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the nativ...
Online ISSN: 2335-884X. http://itc.ktu.lt/index.php/ITC/article/view/18066In this research we compar...
Abstract. This paper presents a method for diacritics restoration based on learning mechanisms that ...
Properly written texts in Igbo, a low resource African language, are rich in both orthographic and...
The goal of this thesis is to develop a Java MIDP application for automatic reconstruction of the di...
With natural language processing (NLP), researchers aim to get the computer to identify and understa...
Diacritics restoration became a ubiquitous task in the Latinalphabet-based English-dominated Interne...
In this paper, we describe a method based on statistical machine translation (SMT) that is able to r...
Much of the text data that exists in many languages is locked away in nondigitized books and documen...
Despite the modern boom in technology, we are still faced with the fact that people write texts with...
The orthography of many resource-scarce languages includes diacritically marked characters. Falling ...
Abstract. The orthography of many resource-scarce languages includes diacritically marked characters...
Igbo is a low-resource African language with orthographic and tonal diacritics, which capture distin...
Statistical language models are utilized in many speech processing algorithms, e.g., automatic speec...
Corpus of texts in 12 languages. For each language, we provide one training, one development and one...
Igbo is a low-resource language spoken by approximately 30 million people worldwide. It is the nativ...
Online ISSN: 2335-884X. http://itc.ktu.lt/index.php/ITC/article/view/18066In this research we compar...
Abstract. This paper presents a method for diacritics restoration based on learning mechanisms that ...
Properly written texts in Igbo, a low resource African language, are rich in both orthographic and...
The goal of this thesis is to develop a Java MIDP application for automatic reconstruction of the di...
With natural language processing (NLP), researchers aim to get the computer to identify and understa...
Diacritics restoration became a ubiquitous task in the Latinalphabet-based English-dominated Interne...
In this paper, we describe a method based on statistical machine translation (SMT) that is able to r...
Much of the text data that exists in many languages is locked away in nondigitized books and documen...
Despite the modern boom in technology, we are still faced with the fact that people write texts with...