This paper applies Distributional Clustering (Pereira et al. 1993) to document classification. The approach clusters words into groups based on the distribution of class labels associated with each word. Thus, unlike some other unsupervised dimensionality-reduction techniques, such as Latent Semantic Indexing, we are able to compress the feature space much more aggressively, while still maintaining high document classification accuracy. Experimental results obtained on three real-world data sets show that we can reduce the feature dimensionality by three orders of magnitude and lose only 2% accuracy---significantly better than Latent Semantic Indexing (Deerwester et al. 1990), class-based clustering (Brown et al. 1992), feature selection b...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
Abstract. Text document clustering is a popular task for understanding and sum-marizing large docume...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
tributional Clustering to document classication This approach clusters words into groups based on ...
High dimensionality of text can be a deterrent in applying complex learners such as Support Vector ...
We study an approach to text categorization that combines distributional clustering of words and a S...
In this paper, a comparative analysis of text document clustering algorithms based on latent semanti...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
Predefined categories can be assigned to the natural language text using for text classification. It...
Text Categorization is traditionally done by using the term frequency and inverse document frequency...
We describe and experimentally evaluate a method for automatically clustering words according to the...
We present a novel implementation of the recently introduced information bottleneck method for unsup...
We propose a novel document clustering method, which aims to cluster the docu-ments into different s...
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text cla...
The purpose of text clustering in information retrieval is to discover groups of semantically relate...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
Abstract. Text document clustering is a popular task for understanding and sum-marizing large docume...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
tributional Clustering to document classication This approach clusters words into groups based on ...
High dimensionality of text can be a deterrent in applying complex learners such as Support Vector ...
We study an approach to text categorization that combines distributional clustering of words and a S...
In this paper, a comparative analysis of text document clustering algorithms based on latent semanti...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
Predefined categories can be assigned to the natural language text using for text classification. It...
Text Categorization is traditionally done by using the term frequency and inverse document frequency...
We describe and experimentally evaluate a method for automatically clustering words according to the...
We present a novel implementation of the recently introduced information bottleneck method for unsup...
We propose a novel document clustering method, which aims to cluster the docu-ments into different s...
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text cla...
The purpose of text clustering in information retrieval is to discover groups of semantically relate...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...
Abstract. Text document clustering is a popular task for understanding and sum-marizing large docume...
Clustering is one of the most researched areas of data mining applications in the contemporary liter...