In this paper, we propose a new unsupervised approach for word segmentation. The core idea of our approach is a novel word induction criterion called WordRank, which estimates the goodness of word hypotheses (character or phoneme sequences). We devise a method to derive exterior word boundary information from the link structures of adjacent word hypotheses and incorporate interior word boundary information to complete the model. In light of WordRank, word segmentation can be modeled as an optimization problem. A Viterbi-styled algorithm is developed for the search of the optimal segmentation. Extensive experiments conducted on phonetic transcripts as well as standard Chinese and Japanese data sets demonstrate the effectiveness of our approa...
International audienceA basic task in first language acquisition likely involves discovering the bou...
This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese wo...
Adaptor grammars are a framework for expressing and performing inference over a variety of non-param...
This thesis proposes a fast and simple unsupervised word segmentation algorithm that utilizes the lo...
Developing better methods for segmenting continuous text into words is important for improving the p...
Abstract — In this paper we consider the unsupervised word discovery from phonetic input. We employ ...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
By exploiting unlabeled data for further performance improvement for Chinese word segmentation, this...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
It is often assumed that MinimumDescrip-tion Length (MDL) is a good criterion for unsupervised word ...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
The fact that words are not conventionally demarcated in Chinese orthography makes the process of wo...
From a cognitive point of view, words can be recognized based on learned data which can be obtained ...
Unsupervised speech processing methods are essential for ap-plications ranging from zero-resource sp...
A Chinese sentence is typically written as a sequence of characters. However, a word, a logical sema...
International audienceA basic task in first language acquisition likely involves discovering the bou...
This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese wo...
Adaptor grammars are a framework for expressing and performing inference over a variety of non-param...
This thesis proposes a fast and simple unsupervised word segmentation algorithm that utilizes the lo...
Developing better methods for segmenting continuous text into words is important for improving the p...
Abstract — In this paper we consider the unsupervised word discovery from phonetic input. We employ ...
This paper addresses two remaining challenges in Chinese word segmentation. The challenge in HLT is ...
By exploiting unlabeled data for further performance improvement for Chinese word segmentation, this...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
It is often assumed that MinimumDescrip-tion Length (MDL) is a good criterion for unsupervised word ...
In this paper, we propose a joint model for unsupervised Chinese word segmentation (CWS). Inspired b...
The fact that words are not conventionally demarcated in Chinese orthography makes the process of wo...
From a cognitive point of view, words can be recognized based on learned data which can be obtained ...
Unsupervised speech processing methods are essential for ap-plications ranging from zero-resource sp...
A Chinese sentence is typically written as a sequence of characters. However, a word, a logical sema...
International audienceA basic task in first language acquisition likely involves discovering the bou...
This paper proposes a refined Hierarchical Dirichlet Process (HDP) model for unsupervised Chinese wo...
Adaptor grammars are a framework for expressing and performing inference over a variety of non-param...