Words and n-grams are commonly used Chinese text representing units and are proved to be good features for Chinese Text Categorization and Information Retrieval. But the effectiveness of applying these representing units for Chinese Text Clustering is still uncovered. This paper is a comparative study of representing units in Chinese Text Clustering. With K-means algorithm, several representing units were evaluated including Chinese character N-gram features, word features and their combinations. We found Chinese word features, Chinese character unigram features and bi-gram features most effective in our experiments. The combination of features didn't improve the results. Detailed experimental results on several public Chinese Text Cat...
[[abstract]]The process of text categorization involves some understanding of the content of the doc...
This paper focuses on the high dimensional text problems encountered in text classification.Document...
Personal name disambiguation is a significant issue in natural language processing, which is the bas...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
This paper is a comparative study on representing units in Chinese text categorization. Several kind...
In the processing of Chinese documents and queries in information retrieval (IR), one has to identi...
In the processing of Chinese documents and queries in information retrieval (IR), one has to identif...
AbstractText clustering is an important means and method in text mining. The process of Chinese text...
In recent years, there has been an increasing interest in data clustering of short documents. Existi...
[[abstract]]In this paper, we propose and evaluate approaches to categorizing Chinese texts, which c...
Tibetan text clustering has potential in Tibetan information processing domain. In this paper, clust...
A Chinese character embedded in different compound words may carry different meanings. In this paper...
三重大学大学院工学研究科博士前期課程情報工学専攻Automatic text classification (ATC) is the task to automatically assign one ...
Abstract: Giving further consideration on linguistic feature, this study proposes an algorithm of Ch...
Automatic text classification (ATC) is the task of automatically assigning one or more appropriate c...
[[abstract]]The process of text categorization involves some understanding of the content of the doc...
This paper focuses on the high dimensional text problems encountered in text classification.Document...
Personal name disambiguation is a significant issue in natural language processing, which is the bas...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
This paper is a comparative study on representing units in Chinese text categorization. Several kind...
In the processing of Chinese documents and queries in information retrieval (IR), one has to identi...
In the processing of Chinese documents and queries in information retrieval (IR), one has to identif...
AbstractText clustering is an important means and method in text mining. The process of Chinese text...
In recent years, there has been an increasing interest in data clustering of short documents. Existi...
[[abstract]]In this paper, we propose and evaluate approaches to categorizing Chinese texts, which c...
Tibetan text clustering has potential in Tibetan information processing domain. In this paper, clust...
A Chinese character embedded in different compound words may carry different meanings. In this paper...
三重大学大学院工学研究科博士前期課程情報工学専攻Automatic text classification (ATC) is the task to automatically assign one ...
Abstract: Giving further consideration on linguistic feature, this study proposes an algorithm of Ch...
Automatic text classification (ATC) is the task of automatically assigning one or more appropriate c...
[[abstract]]The process of text categorization involves some understanding of the content of the doc...
This paper focuses on the high dimensional text problems encountered in text classification.Document...
Personal name disambiguation is a significant issue in natural language processing, which is the bas...