Adaptor grammars are a framework for expressing and performing inference over a variety of non-parametric linguistic models. These models currently provide state-of-the-art performance on unsupervised word segmentation from phonemic representations of child-directed unsegmented English utterances. This paper investigates the applicability of these models to unsupervised word segmentation of Mandarin. We investigate a wide variety of different segmentation models, and show that the best segmentation accuracy is obtained from models that capture interword "collocational" dependencies. Surprisingly, enhancing the models to exploit syllable structure regularities and to capture tone information does improve overall word segmentation accuracy, p...
This paper presents a bilingual semi-supervised Chinese word segmentation (CWS) method that leverage...
This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese wo...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
International audienceIn this paper, we present an unsupervised segmentation system tested on Mandar...
This dissertation addresses the question of wordhood and unsupervised word identification in writt...
This dissertation addresses the question of wordhood and unsupervised word identification in written...
This paper presents a Chinese word segmentation system that uses improved source-channel models of C...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
This paper presents a Chinese word segmentation system that uses improved source-channel models of C...
By exploiting unlabeled data for further performance improvement for Chinese word segmentation, this...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our ap...
We conducted experiments on forced alignment in Mandarin Chinese. A corpus of 7,849 utterances was c...
It is often assumed that MinimumDescrip-tion Length (MDL) is a good criterion for unsupervised word ...
The Chinese language, unlike English, is written without marked word boundaries, and Chinese word se...
This paper presents a bilingual semi-supervised Chinese word segmentation (CWS) method that leverage...
This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese wo...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
International audienceIn this paper, we present an unsupervised segmentation system tested on Mandar...
This dissertation addresses the question of wordhood and unsupervised word identification in writt...
This dissertation addresses the question of wordhood and unsupervised word identification in written...
This paper presents a Chinese word segmentation system that uses improved source-channel models of C...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
This paper presents a Chinese word segmentation system that uses improved source-channel models of C...
By exploiting unlabeled data for further performance improvement for Chinese word segmentation, this...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our ap...
We conducted experiments on forced alignment in Mandarin Chinese. A corpus of 7,849 utterances was c...
It is often assumed that MinimumDescrip-tion Length (MDL) is a good criterion for unsupervised word ...
The Chinese language, unlike English, is written without marked word boundaries, and Chinese word se...
This paper presents a bilingual semi-supervised Chinese word segmentation (CWS) method that leverage...
This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese wo...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...