In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired by the “products of ex-perts ” idea, our joint model firstly com-bines two generative models, which are word-based hierarchical Dirichlet process model and character-based hidden Markov model, by simply multiplying their proba-bilities together. Gibbs sampling is used for model inference. In order to further combine the strength of goodness-based model, we then integrated nVBE into our joint model by using it to initializing the Gibbs sampler. We conduct our experi-ments on PKU and MSRA datasets pro-vided by the second SIGHAN bakeoff. Test results on these two datasets show that the joint model achieves much bet-ter results than all of its com...
In this paper we present a two-stage statistical word segmentation system for Chinese based on word ...
Abstract. Since the traditional word-based n-gram model, a generative approach, cannot handle those ...
It is often assumed that MinimumDescrip-tion Length (MDL) is a good criterion for unsupervised word ...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese wo...
There are two dominant approaches to Chinese word segmentation: word-based and character-based model...
There are two dominant approaches to Chinese word segmentation: word-based and character-based model...
Unsupervised word segmentation (UWS) can provide domain-adaptive segmenta-tion for statistical machi...
Current character-based approaches are not robust for cross domain Chinese word segmentation. In thi...
International audienceIn this paper, we present an unsupervised segmentation system tested on Mandar...
This paper presents a bilingual semi-supervised Chinese word segmentation (CWS) method that leverage...
This paper presents a novel approach to Chinese word segmentation (CWS) that attempts to utilize glo...
A Chinese sentence is typically written as a sequence of characters. However, a word, a logical sema...
This paper presents a novel approach to Chinese word segmentation (CWS) that attempts to utilize glo...
In this paper we present a two-stage statistical word segmentation system for Chinese based on word ...
Abstract. Since the traditional word-based n-gram model, a generative approach, cannot handle those ...
It is often assumed that MinimumDescrip-tion Length (MDL) is a good criterion for unsupervised word ...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese wo...
There are two dominant approaches to Chinese word segmentation: word-based and character-based model...
There are two dominant approaches to Chinese word segmentation: word-based and character-based model...
Unsupervised word segmentation (UWS) can provide domain-adaptive segmenta-tion for statistical machi...
Current character-based approaches are not robust for cross domain Chinese word segmentation. In thi...
International audienceIn this paper, we present an unsupervised segmentation system tested on Mandar...
This paper presents a bilingual semi-supervised Chinese word segmentation (CWS) method that leverage...
This paper presents a novel approach to Chinese word segmentation (CWS) that attempts to utilize glo...
A Chinese sentence is typically written as a sequence of characters. However, a word, a logical sema...
This paper presents a novel approach to Chinese word segmentation (CWS) that attempts to utilize glo...
In this paper we present a two-stage statistical word segmentation system for Chinese based on word ...
Abstract. Since the traditional word-based n-gram model, a generative approach, cannot handle those ...
It is often assumed that MinimumDescrip-tion Length (MDL) is a good criterion for unsupervised word ...