In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostly focused on different variants of generative Markov chain models. Although discriminative machine learning methods like Support Vector Machine (SVM) have been quite successful in text classification with word features, it is neither effective nor efficient to apply them straightforwardly taking all substrings in the corpus as features. In this paper, we propose to partition all substrings into statistical equivalence groups, and then pick those groups which are important (in the statistical sense) as features (named key-substring-group features) for text classific...
The text is nothing but the combination of characters. Therefore, analyzing and extracting informati...
Text genre classification is the process of identifying functional characteristics of text documents...
Abstract. A universal problem with text classification has a problem due to the high dimensionality ...
We propose a novel approach for categorizing text documents based on the use of a special kernel. Th...
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining2006474...
Support Vector Machines (SVM) can classify objects described by an effectively infinite-dimensional ...
Text data mining is the process of extracting and analyzing valuable information from text. A text d...
In this paper, we address the problem of dealing with a large collection of data and propose a met...
We propose a novel approach for categorizing text documents based on the use of a special kernel. Th...
Due to existence of a huge amount of textual data either on the World Wide Web or in textual databas...
The Text mining and Data mining supports different kinds of algorithms for classification of large d...
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text cla...
Bulgarian National Science Fund;Bulgarian Section2019 IEEE International Symposium on INnovations in...
Data mining in text streams, or text stream mining, is an increasingly im- portant topic for a numbe...
Kilimci, Zeynep Hilal (Dogus Author) -- Conference full title: IEEE International Symposium on INnov...
The text is nothing but the combination of characters. Therefore, analyzing and extracting informati...
Text genre classification is the process of identifying functional characteristics of text documents...
Abstract. A universal problem with text classification has a problem due to the high dimensionality ...
We propose a novel approach for categorizing text documents based on the use of a special kernel. Th...
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining2006474...
Support Vector Machines (SVM) can classify objects described by an effectively infinite-dimensional ...
Text data mining is the process of extracting and analyzing valuable information from text. A text d...
In this paper, we address the problem of dealing with a large collection of data and propose a met...
We propose a novel approach for categorizing text documents based on the use of a special kernel. Th...
Due to existence of a huge amount of textual data either on the World Wide Web or in textual databas...
The Text mining and Data mining supports different kinds of algorithms for classification of large d...
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text cla...
Bulgarian National Science Fund;Bulgarian Section2019 IEEE International Symposium on INnovations in...
Data mining in text streams, or text stream mining, is an increasingly im- portant topic for a numbe...
Kilimci, Zeynep Hilal (Dogus Author) -- Conference full title: IEEE International Symposium on INnov...
The text is nothing but the combination of characters. Therefore, analyzing and extracting informati...
Text genre classification is the process of identifying functional characteristics of text documents...
Abstract. A universal problem with text classification has a problem due to the high dimensionality ...