Mixtures of von Mises-Fisher distributions have been shown to be an effective model for clustering data on a unit hypersphere, but variable selection for these models remains an important and challenging problem. In this paper, we derive two variants of the expectation-maximization framework, which are each used to identify a specific type of irrelevant variables for these models. The first type are noise variables, which are not useful for separating any pairs of clusters. The second type are redundant variables, which may be useful for separating pairs of clusters, but do not enable any additional separation beyond the separability provided by some other variables. Removing these irrelevant variables is shown to improve cluster quality in...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Clustering remains a vibrant area of research in statistics. Although there are many books on this t...
We consider the problem of variable or feature selection for model-based clustering. The problem of ...
Mixtures of von Mises-Fisher distributions have been shown to be an effective model for clustering d...
Mixtures of von Mises-Fisher distributions have been shown to be an effective model for clustering d...
Variable selection for clustering is an important and challenging problem in high-dimensional data a...
International audienceWe compare two major approaches to variable selection in clustering: model sel...
International audienceWe compare two major approaches to variable selection in clustering: model sel...
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
Il existe des situations de modélisation statistique pour lesquelles le problème classique de classi...
Variable selection and other dimensionality reduction methods are more important than ever before. D...
Variable selection and other dimensionality reduction methods are more important than ever before. D...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Clustering remains a vibrant area of research in statistics. Although there are many books on this t...
We consider the problem of variable or feature selection for model-based clustering. The problem of ...
Mixtures of von Mises-Fisher distributions have been shown to be an effective model for clustering d...
Mixtures of von Mises-Fisher distributions have been shown to be an effective model for clustering d...
Variable selection for clustering is an important and challenging problem in high-dimensional data a...
International audienceWe compare two major approaches to variable selection in clustering: model sel...
International audienceWe compare two major approaches to variable selection in clustering: model sel...
Variable selection for cluster analysis is a difficult problem. The difficulty originates not only f...
Il existe des situations de modélisation statistique pour lesquelles le problème classique de classi...
Variable selection and other dimensionality reduction methods are more important than ever before. D...
Variable selection and other dimensionality reduction methods are more important than ever before. D...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Statistical analysis of data sets of high-dimensionality has met great interest over the past years,...
Clustering remains a vibrant area of research in statistics. Although there are many books on this t...
We consider the problem of variable or feature selection for model-based clustering. The problem of ...