Abstract Microclustering refers to clustering models that produce small clusters or, equivalently, to models where the size of the clusters grows sublinearly with the number of samples. We formulate probabilistic microclustering models by assigning a prior distribution on the size of the clusters, and in particular consider microclustering models with explicit bounds on the size of the clusters. The combinatorial constraints make full Bayesian inference complicated, but we manage to develop a Gibbs sampling algorithm that can efficiently sample from the joint cluster allocation of all data points. We empirically demonstrate the computational efficiency of the algorithm for problem instances of varying difficulty
Classical clustering problems search for a partition of objects into a fixed number of clusters. In ...
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major dr...
The classical center based clustering problems such as k-means/median/center assume that the optimal...
Classical model-based partitional clustering algorithms, such as k-means or mixture of Gaussians, pr...
Most generative models for clustering implicitly assume that the number of data points in each clust...
2014-2015 > Academic research: refereed > Refereed conference paperAccepted ManuscriptPublishe
joint work with Jeff Miller, Brenda Betancourt, Abbas Zaidi, and Hanna Wallach, Giacomo ZanellaMost...
Many popular random partition models, such as the Chinese restaurant process and its two-parameter e...
Deciding the number of clusters k is one of the most difficult problems in Cluster Analysis. For th...
Deciding the number of clusters k is one of the most difficult problems in clus- ter analysis. For ...
ABSTRACT: We consider the problem of clustering a set of points so as to minimize the maximum intra-...
Recent advances in Bayesian models for random partitions have led to the formulation and exploration...
International audienceClustering is often formulated as a discrete optimization problem. The objecti...
We revisit recently proposed algorithms for probabilistic clustering with pair-wise constraints betw...
Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many meth...
Classical clustering problems search for a partition of objects into a fixed number of clusters. In ...
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major dr...
The classical center based clustering problems such as k-means/median/center assume that the optimal...
Classical model-based partitional clustering algorithms, such as k-means or mixture of Gaussians, pr...
Most generative models for clustering implicitly assume that the number of data points in each clust...
2014-2015 > Academic research: refereed > Refereed conference paperAccepted ManuscriptPublishe
joint work with Jeff Miller, Brenda Betancourt, Abbas Zaidi, and Hanna Wallach, Giacomo ZanellaMost...
Many popular random partition models, such as the Chinese restaurant process and its two-parameter e...
Deciding the number of clusters k is one of the most difficult problems in Cluster Analysis. For th...
Deciding the number of clusters k is one of the most difficult problems in clus- ter analysis. For ...
ABSTRACT: We consider the problem of clustering a set of points so as to minimize the maximum intra-...
Recent advances in Bayesian models for random partitions have led to the formulation and exploration...
International audienceClustering is often formulated as a discrete optimization problem. The objecti...
We revisit recently proposed algorithms for probabilistic clustering with pair-wise constraints betw...
Determining the number of clusters in a dataset is a fundamental issue in data clustering. Many meth...
Classical clustering problems search for a partition of objects into a fixed number of clusters. In ...
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major dr...
The classical center based clustering problems such as k-means/median/center assume that the optimal...