International audienceCo-clustering is more useful than one-sided clustering when dealing with high dimensional sparse data. We propose to address the aim of document clustering with a generative model-based co-clustering approach. To this end, we rely on a particular mixture of von Mises-Fisher distributions and propose a new parsimonious model allowing to reveal a block diagonal structure as well as a good partitioning of documents and terms. Then, by setting the estimate of the model parameters under the maximum likelihood (ML) approach, we derive three novel co-clustering algorithms: a soft one and two stochastic variants. Empirical results on numerous simulated and real-world datasets, demonstrate the advantages of our approach to mode...
International audienceThis paper follows a word-document co-clustering model independently introduce...
International audienceWe propose a novel diagonal co-clustering algorithm built upon the double Kmea...
International audienceIn this paper we propose an extension of the PLSA model in which an extra late...
Compte tenu de la pandémie liée au coronavirus, l'événement a été reporté du 7 au 11 juin 2021 et se...
International audienceRecently, different studies have demonstrated the use of co-clustering, a data...
International audienceMany of the datasets encountered in statistics are two-dimensional in nature a...
This paper presents a detailed empirical study of twelve generative approaches to text clustering ob...
International audienceWe propose a novel model based on the von Mises-Fisher (vMF) distribution for ...
In this paper, we present a constrained co-clustering approach for clustering textual documents. Our...
International audienceCo-clustering of document-term matrices has proved to be more effective than o...
Generative models based on the multivariate Bernoulli and multinomial distributions have been widely...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
International audienceThe simultaneous clustering of documents and words, known as co-clustering, ha...
Model-based co-clustering can be seen as a particularly valuable extension of model-based clustering...
Methods for high-dimensional data clustering represents a prolific research area in data mining, enc...
International audienceThis paper follows a word-document co-clustering model independently introduce...
International audienceWe propose a novel diagonal co-clustering algorithm built upon the double Kmea...
International audienceIn this paper we propose an extension of the PLSA model in which an extra late...
Compte tenu de la pandémie liée au coronavirus, l'événement a été reporté du 7 au 11 juin 2021 et se...
International audienceRecently, different studies have demonstrated the use of co-clustering, a data...
International audienceMany of the datasets encountered in statistics are two-dimensional in nature a...
This paper presents a detailed empirical study of twelve generative approaches to text clustering ob...
International audienceWe propose a novel model based on the von Mises-Fisher (vMF) distribution for ...
In this paper, we present a constrained co-clustering approach for clustering textual documents. Our...
International audienceCo-clustering of document-term matrices has proved to be more effective than o...
Generative models based on the multivariate Bernoulli and multinomial distributions have been widely...
Most document clustering algorithms operate in a high dimensional bag-of-words space. The inherent p...
International audienceThe simultaneous clustering of documents and words, known as co-clustering, ha...
Model-based co-clustering can be seen as a particularly valuable extension of model-based clustering...
Methods for high-dimensional data clustering represents a prolific research area in data mining, enc...
International audienceThis paper follows a word-document co-clustering model independently introduce...
International audienceWe propose a novel diagonal co-clustering algorithm built upon the double Kmea...
International audienceIn this paper we propose an extension of the PLSA model in which an extra late...