Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and sele...
In clustering, one may be interested in the classification of similar objects into groups, and one m...
International audienceIn model-based clustering, each cluster is modelled by a parametrised probabil...
Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data ...
Cluster analysis for categorical data has been an active area of research. A well-known problem in ...
Research on the problem of feature selection for clustering continues to develop. This is a challeng...
Assuming that the data originate from a finite mixture of multinomial distributions, we study the pe...
In data clustering, the problem of selecting the subset of most relevant features from the data has ...
Abstract: Clustering is a partition of data into a group of similar or dissimilar data points and ea...
Categorical data has always posed a challenge in data analysis through clustering. With the increasi...
A model-based approach is developed for clustering categorical data with no natural ordering. The pr...
34 pages, 11 figuresInternational audienceCount data is becoming more and more ubiquitous in a wide ...
In this study, we consider unsupervised clustering of categorical vectors that can be of different s...
Automatically determining number of clusters in the data is an unsolved/unexplored problem. First I&...
Clustering is a widely used statistical tool to determine subsets in a given data set. Frequently us...
31 pages, 10 figuresCount data is becoming more and more ubiquitous in a wide range of applications,...
In clustering, one may be interested in the classification of similar objects into groups, and one m...
International audienceIn model-based clustering, each cluster is modelled by a parametrised probabil...
Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data ...
Cluster analysis for categorical data has been an active area of research. A well-known problem in ...
Research on the problem of feature selection for clustering continues to develop. This is a challeng...
Assuming that the data originate from a finite mixture of multinomial distributions, we study the pe...
In data clustering, the problem of selecting the subset of most relevant features from the data has ...
Abstract: Clustering is a partition of data into a group of similar or dissimilar data points and ea...
Categorical data has always posed a challenge in data analysis through clustering. With the increasi...
A model-based approach is developed for clustering categorical data with no natural ordering. The pr...
34 pages, 11 figuresInternational audienceCount data is becoming more and more ubiquitous in a wide ...
In this study, we consider unsupervised clustering of categorical vectors that can be of different s...
Automatically determining number of clusters in the data is an unsolved/unexplored problem. First I&...
Clustering is a widely used statistical tool to determine subsets in a given data set. Frequently us...
31 pages, 10 figuresCount data is becoming more and more ubiquitous in a wide range of applications,...
In clustering, one may be interested in the classification of similar objects into groups, and one m...
International audienceIn model-based clustering, each cluster is modelled by a parametrised probabil...
Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data ...