How can we train a statistical mixture model on a massive data set? In this work we show how to construct \emph{coresets} for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being \emph{independent} of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions...
We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians with-out ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
University of AmsterdamWe present a deterministic greedy method to learn a mixture of Gaussians whic...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
Abstract How can we train a statistical mixture model on a massive data set? In this paper, we show ...
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate...
International audienceLearning parameters from voluminous data can be prohibitive in terms of memory...
Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
International audienceWhen performing a learning task on voluminous data, memory and computational t...
This paper represents a preliminary (pre-reviewing) version of a sublinear variational algorithm for...
Mixtures of gaussian (or normal) distributions arise in a variety of application areas. Many techniq...
In this paper we show that very large mixtures of Gaussians are efficiently learnable in high dimens...
Gaussian Mixture Models (GMM) are one of the most potent parametric density estimators based on the ...
We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians with-out ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
University of AmsterdamWe present a deterministic greedy method to learn a mixture of Gaussians whic...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
Abstract How can we train a statistical mixture model on a massive data set? In this paper, we show ...
Given data drawn from a mixture of multivariate Gaussians, a basic problem is to accurately estimate...
International audienceLearning parameters from voluminous data can be prohibitive in terms of memory...
Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
International audienceWhen performing a learning task on voluminous data, memory and computational t...
This paper represents a preliminary (pre-reviewing) version of a sublinear variational algorithm for...
Mixtures of gaussian (or normal) distributions arise in a variety of application areas. Many techniq...
In this paper we show that very large mixtures of Gaussians are efficiently learnable in high dimens...
Gaussian Mixture Models (GMM) are one of the most potent parametric density estimators based on the ...
We provide an algorithm for properly learning mixtures of two single-dimensional Gaussians with-out ...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Comp...
University of AmsterdamWe present a deterministic greedy method to learn a mixture of Gaussians whic...