Classical model-based partitional clustering algorithms, such as k-means or mixture of Gaussians, provide only loose and indirect control over the size of the resulting clusters. In this work, we present a family of probabilistic clustering models that can be steered towards clusters of desired size by providing a prior distribution over the possible sizes, allowing the analyst to fine-tune exploratory analysis or to produce clusters of suitable size for future down-stream processing. Our formulation supports arbitrary multimodal prior distributions, generalizing the previous work on clustering algorithms searching for clusters of equal size or algorithms designed for the microclustering task of finding small clusters. We provide practical ...
Recent advances in Bayesian models for random partitions have led to the formulation and exploration...
Predictive clustering is a new supervised learning framework derived from traditional clustering. Th...
Probabilistic modeling for data mining and machine learning problems is a fundamental research area....
Classical model-based partitional clustering algorithms, such as k-means or mixture of Gaussians, pr...
Abstract Microclustering refers to clustering models that produce small clusters or, equivalently, t...
2014-2015 > Academic research: refereed > Refereed conference paperAccepted ManuscriptPublishe
Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many meth...
Deciding the number of clusters k is one of the most difficult problems in clus- ter analysis. For ...
Most generative models for clustering implicitly assume that the number of data points in each clust...
Deciding the number of clusters k is one of the most difficult problems in Cluster Analysis. For th...
One of the common problems with clustering is that the generated clusters often do not match user ex...
The probabilistic distance clustering method of [1] works well if the cluster sizes are approximatel...
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major dr...
A new dissimilarity measure for cluster analysis is presented and used in the context of probabilist...
In cluster analysis interest lies in probabilistically capturing partitions of individuals, items o...
Recent advances in Bayesian models for random partitions have led to the formulation and exploration...
Predictive clustering is a new supervised learning framework derived from traditional clustering. Th...
Probabilistic modeling for data mining and machine learning problems is a fundamental research area....
Classical model-based partitional clustering algorithms, such as k-means or mixture of Gaussians, pr...
Abstract Microclustering refers to clustering models that produce small clusters or, equivalently, t...
2014-2015 > Academic research: refereed > Refereed conference paperAccepted ManuscriptPublishe
Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many meth...
Deciding the number of clusters k is one of the most difficult problems in clus- ter analysis. For ...
Most generative models for clustering implicitly assume that the number of data points in each clust...
Deciding the number of clusters k is one of the most difficult problems in Cluster Analysis. For th...
One of the common problems with clustering is that the generated clusters often do not match user ex...
The probabilistic distance clustering method of [1] works well if the cluster sizes are approximatel...
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major dr...
A new dissimilarity measure for cluster analysis is presented and used in the context of probabilist...
In cluster analysis interest lies in probabilistically capturing partitions of individuals, items o...
Recent advances in Bayesian models for random partitions have led to the formulation and exploration...
Predictive clustering is a new supervised learning framework derived from traditional clustering. Th...
Probabilistic modeling for data mining and machine learning problems is a fundamental research area....