[[abstract]]In this paper, we propose and evaluate approaches to categorizing Chinese texts, which consist of term extraction, term selection, term clustering and text classification. We propose a scalable approach which uses frequency counts to identify left and right boundaries of possibly significant terms. We used the combination of term selection and term clustering to reduce the dimension of the vector space to a practical level. While the huge number of possible Chinese terms makes most of the machine learning algorithms impractical, results obtained in an experiment on a CAN news collection show that the dimension could be dramatically reduced to 1200 while approximately the same level of classification accuracy was maintained using...
对基于中文的Web文本分类技术进行了研究,介绍了web文本分类的基本过程和Web文本预处理及文本特征选取的方法,重点介绍了一种常用的基于内容的分类算法KNN。最后通过实验测试了使用KNN算法的中文We...
Automatic indexing is the automatic creation of a text surrogate, normally keywords or phrases, to r...
This paper focuses on the high dimensional text problems encountered in text classification.Document...
[[abstract]]The goal of this paper is to derive extra representatives from each class to compensate ...
本文 / Division of Systems Engineering Graduate School of Engineering Mie UniversityAutomatic text cla...
[[abstract]]The process of text categorization involves some understanding of the content of the doc...
三重大学大学院工学研究科博士前期課程情報工学専攻Automatic text classification (ATC) is the task to automatically assign one ...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
[[abstract]]The process of text categorization involves some understanding of the content of the doc...
[[abstract]]Recently research on text mining has attracted lots of attention from both industrial an...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Abstract. This paper reports our comparative evaluation of three machine learning methods, namely k ...
对基于中文的Web文本分类技术进行了研究,介绍了web文本分类的基本过程和Web文本预处理及文本特征选取的方法,重点介绍了一种常用的基于内容的分类算法KNN。最后通过实验测试了使用KNN算法的中文We...
Automatic indexing is the automatic creation of a text surrogate, normally keywords or phrases, to r...
This paper focuses on the high dimensional text problems encountered in text classification.Document...
[[abstract]]The goal of this paper is to derive extra representatives from each class to compensate ...
本文 / Division of Systems Engineering Graduate School of Engineering Mie UniversityAutomatic text cla...
[[abstract]]The process of text categorization involves some understanding of the content of the doc...
三重大学大学院工学研究科博士前期課程情報工学専攻Automatic text classification (ATC) is the task to automatically assign one ...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
[[abstract]]The process of text categorization involves some understanding of the content of the doc...
[[abstract]]Recently research on text mining has attracted lots of attention from both industrial an...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Abstract. This paper reports our comparative evaluation of three machine learning methods, namely k ...
对基于中文的Web文本分类技术进行了研究,介绍了web文本分类的基本过程和Web文本预处理及文本特征选取的方法,重点介绍了一种常用的基于内容的分类算法KNN。最后通过实验测试了使用KNN算法的中文We...
Automatic indexing is the automatic creation of a text surrogate, normally keywords or phrases, to r...
This paper focuses on the high dimensional text problems encountered in text classification.Document...