We study unsupervised classification of text documents into a taxonomy of concepts annotated by only a few keywords. Our central claim is that the structure of the taxonomy encapsulates background knowledge that can be exploited to improve classification accuracy. Under our hierarchical Dirichlet generative model for the document corpus, we show that the unsupervised classification algorithm provides robust estimates of the classification parameters by performing regularization, and that our algorithm can be interpreted as a regularized EM algorithm. We also propose a technique for the automatic choice of the regularization parameter. In addition we propose a regularization scheme for K-means for hierarchies. We experimentally demonstrate t...
Classification We propose a new algorithm for dimensionality reduction and unsupervised text classif...
Abstract. The task of text classification is the assignment of labels that describe texts ’ char-act...
Abstract. The spread and abundance of electronic documents requires automatic techniques for extract...
The proliferation of text documents on the web as well as within institutions necessitates their con...
The proliferation of text documents on the web as well as within institutions necessitates their con...
We address the problem of unsupervised classification of documents into a given hierarchy of concept...
This report addresses the problem of learning a taxonomy from a given domain-specific text corpus. W...
Taxonomies hierarchically organize concepts in a domain. Building and maintaining them by hand is a ...
Supervised and unsupervised learning have been the focus of critical research in the areas of machin...
We introduce three linguistically moti-vated structured regularizers based on parse trees, topics, a...
This paper proposes a framework to automatically construct taxonomies from a corpus of text document...
Organizing textual documents into a hierarchical taxonomy is a common practice in knowledge manageme...
growing interest due to the widespread proliferation of topic hierarchies for text documents. The wo...
We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many ...
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously c...
Classification We propose a new algorithm for dimensionality reduction and unsupervised text classif...
Abstract. The task of text classification is the assignment of labels that describe texts ’ char-act...
Abstract. The spread and abundance of electronic documents requires automatic techniques for extract...
The proliferation of text documents on the web as well as within institutions necessitates their con...
The proliferation of text documents on the web as well as within institutions necessitates their con...
We address the problem of unsupervised classification of documents into a given hierarchy of concept...
This report addresses the problem of learning a taxonomy from a given domain-specific text corpus. W...
Taxonomies hierarchically organize concepts in a domain. Building and maintaining them by hand is a ...
Supervised and unsupervised learning have been the focus of critical research in the areas of machin...
We introduce three linguistically moti-vated structured regularizers based on parse trees, topics, a...
This paper proposes a framework to automatically construct taxonomies from a corpus of text document...
Organizing textual documents into a hierarchical taxonomy is a common practice in knowledge manageme...
growing interest due to the widespread proliferation of topic hierarchies for text documents. The wo...
We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many ...
We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously c...
Classification We propose a new algorithm for dimensionality reduction and unsupervised text classif...
Abstract. The task of text classification is the assignment of labels that describe texts ’ char-act...
Abstract. The spread and abundance of electronic documents requires automatic techniques for extract...