Word segmentation is a low-level NLP taskt hat is non-trivial for a considerable number of languages. In this paper, we present asequence tagging framework and apply it to word segmentation for a wide range of languages with different writing systems and typological characteristics. Additionally, we investigate the correlations between various typological factors and word segmentation accuracy. The experimental results indicate that segmentation accuracy is positively related to word boundary markers and negatively to the number of unique non-segmental terms. Based on the analysis, we design a small set of language-specific settings and extensively evaluate the segmentation system on the Universal Dependencies datasets. Our model obtains st...
Abstract. Word segmentation has been shown helpful for Chinese-to-English machine translation (MT), ...
Models of the acquisition of word segmen-tation are typically evaluated using phonem-ically transcri...
In this paper a grammatical-based wordsegmentation algorithm is presented. As thenature of Myanmar l...
Word segmentation is a low-level NLP taskt hat is non-trivial for a considerable number of languages...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...
This paper presents a model of language processing where word segmentation is an integral part of se...
We select three word segmentation models with psycholinguistic foundations - transitional probabilit...
Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) i...
Lexical dependencies abound in natural language: words tend to follow particular words or word categ...
This paper examines the problem of lexical segmentation in order to identify language-specific and l...
International audienceThis paper describes how a tokenizer can be trained from any dataset in the Un...
Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) i...
How can infants detect where words or morphemes start and end in the continuous stream of speech? Pr...
Word segmentation is the first step in Chinese information processing, and the performance of the se...
In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our ap...
Abstract. Word segmentation has been shown helpful for Chinese-to-English machine translation (MT), ...
Models of the acquisition of word segmen-tation are typically evaluated using phonem-ically transcri...
In this paper a grammatical-based wordsegmentation algorithm is presented. As thenature of Myanmar l...
Word segmentation is a low-level NLP taskt hat is non-trivial for a considerable number of languages...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...
This paper presents a model of language processing where word segmentation is an integral part of se...
We select three word segmentation models with psycholinguistic foundations - transitional probabilit...
Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) i...
Lexical dependencies abound in natural language: words tend to follow particular words or word categ...
This paper examines the problem of lexical segmentation in order to identify language-specific and l...
International audienceThis paper describes how a tokenizer can be trained from any dataset in the Un...
Two word-spotting experiments are reported that examine whether the Possible-Word Constraint (PWC) i...
How can infants detect where words or morphemes start and end in the continuous stream of speech? Pr...
Word segmentation is the first step in Chinese information processing, and the performance of the se...
In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our ap...
Abstract. Word segmentation has been shown helpful for Chinese-to-English machine translation (MT), ...
Models of the acquisition of word segmen-tation are typically evaluated using phonem-ically transcri...
In this paper a grammatical-based wordsegmentation algorithm is presented. As thenature of Myanmar l...