Variable Selection for Model-Based Clustering

Adrian E. Raftery
Nema Dean

Publication date

January 2006

Abstract

We consider the problem of variable or feature selection for model-based clustering. The problem of comparing two nested subsets of variables is recast as a model comparison problem and addressed using approximate Bayes factors. A greedy search algorithm is proposed for finding a local optimum in model space. The resulting method selects variables (or features), the number of clusters, and the clustering model simultaneously. We applied the method to several simulated and real examples and found that removing irrelevant variables often improved performance. Compared with methods based on all of the variables, our variable selection method consistently yielded more accurate estimates of the number of groups and lower classification error rat...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Variable Selection for Model-Based Clustering

Abstract

Extracted data

Variable Selection for Model-Based Clustering

Abstract

Extracted data

Related items

Related items