Text pre-processing is an important component of a Chinese text classification. At present, however, most of the studies on this topic focus on exploring the influence of preprocessing methods on a few text classification algorithms using English text. In this paper we experimentally compared fifteen commonly used classifiers on two Chinese datasets using three widely used Chinese preprocessing methods that include word segmentation, Chinese specific stop word removal, and Chinese specific symbol removal. We then explored the influence of the preprocessing methods on the final classifications according to various conditions such as classification evaluation, combination style, and classifier selection. Finally, we conducted a battery of var...
Large alphabet languages such as Chinese present different problems for language modelling compared ...
Textual information written in Chinese now represents a huge knowledge repository. The first step of...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Automatic text classification (ATC) is the task of automatically assigning one or more appropriate c...
Text classification (TC) is the task of automatically assigning documents to a fixed number of categ...
In a standard text classification (TC) study, preprocessing is one of the key components to improve ...
三重大学大学院工学研究科博士前期課程情報工学専攻Automatic text classification (ATC) is the task to automatically assign one ...
Considering the explosive growth of data, the increased amount of text data’s effect on the performa...
Effective feature selection is essential to make the learning task efficient and more accurate. In t...
[[abstract]]In this paper, we propose and evaluate approaches to categorizing Chinese texts, which c...
Text classification aims to assign predefined labels to unlabeled sentences, which tend to struggle ...
Large alphabet languages such as Chinese present different problems for language modelling compared ...
Textual information written in Chinese now represents a huge knowledge repository. The first step of...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Automatic text classification (ATC) is the task of automatically assigning one or more appropriate c...
Text classification (TC) is the task of automatically assigning documents to a fixed number of categ...
In a standard text classification (TC) study, preprocessing is one of the key components to improve ...
三重大学大学院工学研究科博士前期課程情報工学専攻Automatic text classification (ATC) is the task to automatically assign one ...
Considering the explosive growth of data, the increased amount of text data’s effect on the performa...
Effective feature selection is essential to make the learning task efficient and more accurate. In t...
[[abstract]]In this paper, we propose and evaluate approaches to categorizing Chinese texts, which c...
Text classification aims to assign predefined labels to unlabeled sentences, which tend to struggle ...
Large alphabet languages such as Chinese present different problems for language modelling compared ...
Textual information written in Chinese now represents a huge knowledge repository. The first step of...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...