We address the problem of unsupervised classification of documents into a given hierarchy of concepts with few unlabeled examples. In contrast to various previous approaches where only the leaves of the hierarchy represent valid classes, we consider the situation where documents must also be classified into internal nodes. We claim that the relationships between classes are part of the prior knowledge that can be used to improve model accuracy. We present modified versions of the K-means and EM clustering algorithms that exploit the structure of the hierarchy to make robust estimations and improve classification accuracy. This is accomplished by smoothing the distributions of the classes according to the taxonomy at each iteration of th...
Fast and high-quality document clustering algorithms play an important role in providing intuitive n...
Hierarchical supervised classifiers are highly demanding in terms of labelled examples, because the...
Vast amounts of text documents are available in various fields. The accumulations of available text ...
We study unsupervised classification of text documents into a taxonomy of concepts annotated by only...
The proliferation of text documents on the web as well as within institutions necessitates their con...
The proliferation of text documents on the web as well as within institutions necessitates their con...
he management of hierarchically organized data is starting to play a key role in the knowledge manag...
growing interest due to the widespread proliferation of topic hierarchies for text documents. The wo...
The management of hierarchically organized data is starting to play a key role in the knowledge mana...
The need to classify text documents within topic hierarchies has given rise to techniques that use t...
Obtaining hierarchical organizations of knowledge is important in many domains. To create such hiera...
Managing the hierarchical organization of data is starting to play a key role in the knowledge manag...
We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many ...
Nowadays, the explosive growth in text data emphasizes the need for developing new and computational...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
Fast and high-quality document clustering algorithms play an important role in providing intuitive n...
Hierarchical supervised classifiers are highly demanding in terms of labelled examples, because the...
Vast amounts of text documents are available in various fields. The accumulations of available text ...
We study unsupervised classification of text documents into a taxonomy of concepts annotated by only...
The proliferation of text documents on the web as well as within institutions necessitates their con...
The proliferation of text documents on the web as well as within institutions necessitates their con...
he management of hierarchically organized data is starting to play a key role in the knowledge manag...
growing interest due to the widespread proliferation of topic hierarchies for text documents. The wo...
The management of hierarchically organized data is starting to play a key role in the knowledge mana...
The need to classify text documents within topic hierarchies has given rise to techniques that use t...
Obtaining hierarchical organizations of knowledge is important in many domains. To create such hiera...
Managing the hierarchical organization of data is starting to play a key role in the knowledge manag...
We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many ...
Nowadays, the explosive growth in text data emphasizes the need for developing new and computational...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
Fast and high-quality document clustering algorithms play an important role in providing intuitive n...
Hierarchical supervised classifiers are highly demanding in terms of labelled examples, because the...
Vast amounts of text documents are available in various fields. The accumulations of available text ...