AbstractWord segmentation is the first step to process language that written in non-Latin letters such as such as Javanese script. In this study, we report our work on word segmentation based on dictionary approach. In the first phase, we generate all possible segmented word series using a word dictionary. The correct word is selected based on the last character in a word, the last two characters in a word, the difference of two consecutive words, and the frequency of the word in the additional corpus. The experimental results show that identifying words using the frequency of words in the additional corpus yield the best accuracy that is 91.08%
Word segmentation is a basic task and animportant problem in natural language processing. InMyanmar ...
A Thai written text is a string of symbols without explicit word boundary markup. A method for a dev...
This study is to develop a word segmentation algorithm and solution for Myanmar language. This is a ...
For languages without word boundary delimiters, dictionaries are needed for segmenting running texts...
In Thai language, the word boundary is not explicitly clear, therefore, word segmentation is needed ...
Word segmentation is a problem in several Asian languages that have no explicit word boundary delimi...
Compounding is a highly productive word-formation process in some languages that is often problemati...
Dzongkha, the national language of Bhutan, is continuous in written form and it fails to mark the w...
The Thai written language is one of the languages that does not have word boundaries. In order to di...
AbstractThe need of complete corpus nowadays is very crucial, especially for linguist. In order to a...
Automation of Javanese script translation is needed to make it easier for people to understand the m...
Sentence segmentation that breaks textual data strings into individual sentences is an important pha...
This paper deals with lexicon and system development for word segmentation in Bangla language.Our go...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...
Due to the growth of electronic documents and the incessant increase of the power and capacity of co...
Word segmentation is a basic task and animportant problem in natural language processing. InMyanmar ...
A Thai written text is a string of symbols without explicit word boundary markup. A method for a dev...
This study is to develop a word segmentation algorithm and solution for Myanmar language. This is a ...
For languages without word boundary delimiters, dictionaries are needed for segmenting running texts...
In Thai language, the word boundary is not explicitly clear, therefore, word segmentation is needed ...
Word segmentation is a problem in several Asian languages that have no explicit word boundary delimi...
Compounding is a highly productive word-formation process in some languages that is often problemati...
Dzongkha, the national language of Bhutan, is continuous in written form and it fails to mark the w...
The Thai written language is one of the languages that does not have word boundaries. In order to di...
AbstractThe need of complete corpus nowadays is very crucial, especially for linguist. In order to a...
Automation of Javanese script translation is needed to make it easier for people to understand the m...
Sentence segmentation that breaks textual data strings into individual sentences is an important pha...
This paper deals with lexicon and system development for word segmentation in Bangla language.Our go...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...
Due to the growth of electronic documents and the incessant increase of the power and capacity of co...
Word segmentation is a basic task and animportant problem in natural language processing. InMyanmar ...
A Thai written text is a string of symbols without explicit word boundary markup. A method for a dev...
This study is to develop a word segmentation algorithm and solution for Myanmar language. This is a ...