8 pages, 3 figures, conferenceWe consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of $m$ points in $n$ dimensions, $n,m \rightarrow \infty$ and $\alpha = m/n$ stays finite. Using exact but non-rigorous methods from statistical physics, we determine the critical value of $\alpha$ and the distance between the clusters at which it becomes information-theoretically possible to reconstruct the membership into clusters better than chance. We also determine the accuracy achievable by the Bayes-optimal estimation algorithm. In particular, we find that when the number of clusters is sufficiently large, $r > 4 + 2 \sqrt{\alpha}$, there is a gap between the threshold for information-theoreticall...
In the big data era, data are typically collected at massive scales and often carry complex structur...
We present a novel algorithm called PG-means which is able to learn the number of clusters in a clas...
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate...
8 pages, 3 figures, conferenceWe consider the problem of Gaussian mixture clustering in the high-dim...
<p>While several papers have investigated computationally and statistically efficient methods for le...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...
We consider the problem of clustering data points in high dimensions, i.e. when the number of data p...
We study classical statistical problems such as community detection on graphs, Principal Component A...
We study classical statistical problems such as as community detection on graphs, Principal Componen...
Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for...
We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gauss...
For every epsilon > 0, we give an efficient algorithm to learn the cluster centers of a mixture of p...
Spectral clustering is one of the most popular algorithms to group high dimensional data. It is easy...
To avoid the curse of dimensionality, a common approach to clustering high-dimensional data is to fi...
We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally...
In the big data era, data are typically collected at massive scales and often carry complex structur...
We present a novel algorithm called PG-means which is able to learn the number of clusters in a clas...
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate...
8 pages, 3 figures, conferenceWe consider the problem of Gaussian mixture clustering in the high-dim...
<p>While several papers have investigated computationally and statistically efficient methods for le...
Cluster analysis faces two problems in high dimensions: first, the “curse of di-mensionality ” that ...
We consider the problem of clustering data points in high dimensions, i.e. when the number of data p...
We study classical statistical problems such as community detection on graphs, Principal Component A...
We study classical statistical problems such as as community detection on graphs, Principal Componen...
Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for...
We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gauss...
For every epsilon > 0, we give an efficient algorithm to learn the cluster centers of a mixture of p...
Spectral clustering is one of the most popular algorithms to group high dimensional data. It is easy...
To avoid the curse of dimensionality, a common approach to clustering high-dimensional data is to fi...
We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally...
In the big data era, data are typically collected at massive scales and often carry complex structur...
We present a novel algorithm called PG-means which is able to learn the number of clusters in a clas...
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate...