Text pre-processing is an important component of a Chinese text classification. At present, however, most of the studies on this topic focus on exploring the influence of preprocessing methods on a few text classification algorithms using English text. In this paper we experimentally compared fifteen commonly used classifiers on two Chinese datasets using three widely used Chinese preprocessing methods that include word segmentation, Chinese specific stop word removal, and Chinese specific symbol removal. We then explored the influence of the preprocessing methods on the final classifications according to various conditions such as classification evaluation, combination style, and classifier selection. Finally, we conducted a battery of var...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Text Pre-processing is a process of converting raw text data in to corpus (bag of words) which is fu...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Automatic text classification (ATC) is the task of automatically assigning one or more appropriate c...
In a standard text classification (TC) study, preprocessing is one of the key components to improve ...
三重大学大学院工学研究科博士前期課程情報工学専攻Automatic text classification (ATC) is the task to automatically assign one ...
Text classification (TC) is the task of automatically assigning documents to a fixed number of categ...
Text classification aims to assign predefined labels to unlabeled sentences, which tend to struggle ...
Considering the explosive growth of data, the increased amount of text data’s effect on the performa...
[[abstract]]In this paper, we propose and evaluate approaches to categorizing Chinese texts, which c...
Text classification is of importance in natural language processing, as the massive text information...
Effective feature selection is essential to make the learning task efficient and more accurate. In t...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Text Pre-processing is a process of converting raw text data in to corpus (bag of words) which is fu...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Text pre-processing is an important component of a Chinese text classification. At present, however,...
Automatic text classification (ATC) is the task of automatically assigning one or more appropriate c...
In a standard text classification (TC) study, preprocessing is one of the key components to improve ...
三重大学大学院工学研究科博士前期課程情報工学専攻Automatic text classification (ATC) is the task to automatically assign one ...
Text classification (TC) is the task of automatically assigning documents to a fixed number of categ...
Text classification aims to assign predefined labels to unlabeled sentences, which tend to struggle ...
Considering the explosive growth of data, the increased amount of text data’s effect on the performa...
[[abstract]]In this paper, we propose and evaluate approaches to categorizing Chinese texts, which c...
Text classification is of importance in natural language processing, as the massive text information...
Effective feature selection is essential to make the learning task efficient and more accurate. In t...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...
Text Pre-processing is a process of converting raw text data in to corpus (bag of words) which is fu...
Words and n-grams are commonly used Chinese text representing units and are proved to be good featur...