This paper studies the sample complexity of learning the $k$ unknown centers of a balanced Gaussian mixture model (GMM) in $\mathbb{R}^d$ with spherical covariance matrix $\sigma^2\mathbf{I}$. In particular, we are interested in the following question: what is the maximal noise level $\sigma^2$, for which the sample complexity is essentially the same as when estimating the centers from labeled measurements? To that end, we restrict attention to a Bayesian formulation of the problem, where the centers are uniformly distributed on the sphere $\sqrt{d}\mathcal{S}^{d-1}$. Our main results characterize the exact noise threshold $\sigma^2$ below which the GMM learning problem, in the large system limit $d,k\to\infty$, is as easy as learning from ...
International audienceWe obtain optimal Gaussian concentration bounds (GCBs) for stochastic chains o...
Presented on September 18, 2017 at 11:00 a.m. in the Klaus Advanced Computing Building, Room 1116E.I...
The hypothesis that high dimensional data tends to lie in the vicinity of a low di-mensional manifol...
<p>While several papers have investigated computationally and statistically efficient methods for le...
We show that, given data from a mixture of k well-separated spherical Gaussians in ℜ^d, a simple two...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Abstract—This paper determines to within a single mea-surement the minimum number of measurements re...
For every epsilon > 0, we give an efficient algorithm to learn the cluster centers of a mixture of p...
We consider the problem of identifying the parameters of an unknown mixture of two ar-bitrary d-dime...
Abstract — Communication channels that are characterized by additive Gaussian noise have been well s...
In the first part of this thesis, we examine the computational complexity of three fundamental stati...
We consider the problem of identifying the parameters of an unknown mixture of two ar-bitrary d-dime...
8 pages, 3 figures, conferenceWe consider the problem of Gaussian mixture clustering in the high-dim...
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate...
Infinite Gaussian mixture modeling (IGMM) is a modeling method that determines all the parameters of...
International audienceWe obtain optimal Gaussian concentration bounds (GCBs) for stochastic chains o...
Presented on September 18, 2017 at 11:00 a.m. in the Klaus Advanced Computing Building, Room 1116E.I...
The hypothesis that high dimensional data tends to lie in the vicinity of a low di-mensional manifol...
<p>While several papers have investigated computationally and statistically efficient methods for le...
We show that, given data from a mixture of k well-separated spherical Gaussians in ℜ^d, a simple two...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
Abstract—This paper determines to within a single mea-surement the minimum number of measurements re...
For every epsilon > 0, we give an efficient algorithm to learn the cluster centers of a mixture of p...
We consider the problem of identifying the parameters of an unknown mixture of two ar-bitrary d-dime...
Abstract — Communication channels that are characterized by additive Gaussian noise have been well s...
In the first part of this thesis, we examine the computational complexity of three fundamental stati...
We consider the problem of identifying the parameters of an unknown mixture of two ar-bitrary d-dime...
8 pages, 3 figures, conferenceWe consider the problem of Gaussian mixture clustering in the high-dim...
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate...
Infinite Gaussian mixture modeling (IGMM) is a modeling method that determines all the parameters of...
International audienceWe obtain optimal Gaussian concentration bounds (GCBs) for stochastic chains o...
Presented on September 18, 2017 at 11:00 a.m. in the Klaus Advanced Computing Building, Room 1116E.I...
The hypothesis that high dimensional data tends to lie in the vicinity of a low di-mensional manifol...