10th International Conference on Applications of Natural Language to Information Systems, NLDB 2005, Alicante, Spain, June 15-17, 2005The distribution of the number of documents in topic classes is typically highly skewed. This leads to good micro-average performance but not so desirable macro-average performance. By viewing topics as clusters in a high dimensional space, we propose the use of clustering to determine subtopic clusters for large topic classes by assuming that large topic clusters are in general a mixture of a number of subtopic clusters. We used the Reuters News articles and support vector machines to evaluate whether using subtopic cluster can lead to better macro-average performance.Department of Computin
Text clustering is a useful and inexpensive way to organize vast text repositories into meaningful t...
Production of news content is growing at an astonishing rate. To help manage and monitor the sheer a...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...
We study an approach to text categorization that combines distributional clustering of words and a S...
In this paper, we introduce a new clustering algorithm for discovering and describing the topics com...
The exponential growth of the size and popularity of the world wide web has increased the interest i...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
Topic models provide a useful tool to organize and understand the structure of large corpora of text...
Abstract: "The world wide web represents vast stores of information. However, the sheer amount of su...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
Data accumulate and there is a growing need of automated systems for partitioning data into groups, ...
DMNLP co-located with the European Conference on Machine Learning and Principles and Practice of Kno...
It is crucial in many information systems to organize short text segments, such as keywords in docum...
A breakneck progress of computers and web makes it easier to collect and store large amount of infor...
Text clustering is a useful and inexpensive way to organize vast text repositories into meaningful t...
Production of news content is growing at an astonishing rate. To help manage and monitor the sheer a...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...
We study an approach to text categorization that combines distributional clustering of words and a S...
In this paper, we introduce a new clustering algorithm for discovering and describing the topics com...
The exponential growth of the size and popularity of the world wide web has increased the interest i...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
Topic models provide a useful tool to organize and understand the structure of large corpora of text...
Abstract: "The world wide web represents vast stores of information. However, the sheer amount of su...
We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, ...
Master of ScienceDepartment of Computer ScienceWilliam HsuThis work describes a comparative study of...
Data accumulate and there is a growing need of automated systems for partitioning data into groups, ...
DMNLP co-located with the European Conference on Machine Learning and Principles and Practice of Kno...
It is crucial in many information systems to organize short text segments, such as keywords in docum...
A breakneck progress of computers and web makes it easier to collect and store large amount of infor...
Text clustering is a useful and inexpensive way to organize vast text repositories into meaningful t...
Production of news content is growing at an astonishing rate. To help manage and monitor the sheer a...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...