In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to severa...
In this paper, we present a novel graph theoretic ap-proach to the problem of document-word co-clust...
International audienceWe propose a novel diagonal co-clustering algorithm built upon the double Kmea...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
International audienceCo-clustering is more useful than one-sided clustering when dealing with high ...
We present a novel implementation of the recently introduced information bottleneck method for unsup...
Abstract. Constrained clustering is a recently presented family of semi-supervised learning algorith...
International audienceRecently, different studies have demonstrated the use of co-clustering, a data...
This paper follows a word-document co-clustering model independently introduced in 2001 by several a...
International audienceThis paper follows a word-document co-clustering model independently introduce...
Most existing semi-supervised document clustering approaches are model-based clustering and can be t...
Document clustering without any prior knowledge or background information is a challenging problem. ...
In the last times, semi-supervised clustering has been an area that has received a lot of attention....
With the development of statistical machine translation, we have ready-to-use tools that can transla...
Fast and high-quality document clustering algorithms play an important role in providing intuitive n...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...
In this paper, we present a novel graph theoretic ap-proach to the problem of document-word co-clust...
International audienceWe propose a novel diagonal co-clustering algorithm built upon the double Kmea...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
International audienceCo-clustering is more useful than one-sided clustering when dealing with high ...
We present a novel implementation of the recently introduced information bottleneck method for unsup...
Abstract. Constrained clustering is a recently presented family of semi-supervised learning algorith...
International audienceRecently, different studies have demonstrated the use of co-clustering, a data...
This paper follows a word-document co-clustering model independently introduced in 2001 by several a...
International audienceThis paper follows a word-document co-clustering model independently introduce...
Most existing semi-supervised document clustering approaches are model-based clustering and can be t...
Document clustering without any prior knowledge or background information is a challenging problem. ...
In the last times, semi-supervised clustering has been an area that has received a lot of attention....
With the development of statistical machine translation, we have ready-to-use tools that can transla...
Fast and high-quality document clustering algorithms play an important role in providing intuitive n...
This paper discusses a new type of semi-supervised docu-ment clustering that uses partial supervisio...
In this paper, we present a novel graph theoretic ap-proach to the problem of document-word co-clust...
International audienceWe propose a novel diagonal co-clustering algorithm built upon the double Kmea...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...