Word segmentation is a problem in several Asian languages that have no explicit word boundary delimiter, e.g. Chinese, Japanese, Korean and Thai. We propose to use featurebased approaches for Thai word segmentation. A feature can be anything that tests for specific information in the context around the word in question, such as context words and collocations. To automatically extract such features from a training corpus, we employ two learning algorithms, namely RIPPER and Winnow. Experimental results show that both algorithms appear to outperform the existing Thai word segmentation methods, especially for context-dependent strings. 1 Introduction Word segmentation is a crucial problem in natural language processing for several Asian lang...
�� 2021 The Authors. Published by ACL. This is an open access article available under a Creative Com...
Information overload is a problem in the Information Age and Information visualization is an approac...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...
In Thai language, the word boundary is not explicitly clear, therefore, word segmentation is needed ...
A Thai written text is a string of symbols without explicit word boundary markup. A method for a dev...
Unlike English, there is no explicit sentence marker in Thai language. Conventionally, a space is pl...
For languages without word boundary delimiters, dictionaries are needed for segmenting running texts...
Abstract. Word segmentation is an important task in natural language processing, especially for lang...
Abstract This paper discusses a Thai corpus, TaLAPi, fully annotated with word segmentation (WS), pa...
A sentence is typically treated as the minimal syntactic unit used to extract valuable information f...
This study is to develop a word segmentation algorithm and solution for Myanmar language. This is a ...
The aim of this thesis is to design and implement a computational linguistic module for analysing Th...
The Thai written language is one of the languages that does not have word boundaries. In order to di...
This study reports the development of a Myanmar word segmentation method using Unicode standard enco...
Since Thai writing system has no explicit word and sentence boundaries, language sense in Thai depen...
�� 2021 The Authors. Published by ACL. This is an open access article available under a Creative Com...
Information overload is a problem in the Information Age and Information visualization is an approac...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...
In Thai language, the word boundary is not explicitly clear, therefore, word segmentation is needed ...
A Thai written text is a string of symbols without explicit word boundary markup. A method for a dev...
Unlike English, there is no explicit sentence marker in Thai language. Conventionally, a space is pl...
For languages without word boundary delimiters, dictionaries are needed for segmenting running texts...
Abstract. Word segmentation is an important task in natural language processing, especially for lang...
Abstract This paper discusses a Thai corpus, TaLAPi, fully annotated with word segmentation (WS), pa...
A sentence is typically treated as the minimal syntactic unit used to extract valuable information f...
This study is to develop a word segmentation algorithm and solution for Myanmar language. This is a ...
The aim of this thesis is to design and implement a computational linguistic module for analysing Th...
The Thai written language is one of the languages that does not have word boundaries. In order to di...
This study reports the development of a Myanmar word segmentation method using Unicode standard enco...
Since Thai writing system has no explicit word and sentence boundaries, language sense in Thai depen...
�� 2021 The Authors. Published by ACL. This is an open access article available under a Creative Com...
Information overload is a problem in the Information Age and Information visualization is an approac...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...