Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore the use of coresets -a data summarization technique originating from computational geometry -for this task. Coresets are weighted subsets of the data such that models trained on these coresets are provably competitive with models trained on the full dataset. Coresets sublinear in the dataset size allow for fast approximate inference with provable guarantees. Existing constructions, however, are limited to parametric problems. Using novel techniques in coreset construction we show the existence of coresets for DP-Means -a prototypical nonparametric clustering problem -and provide a practical construction algorithm. We empirically demo...
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many ...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtai...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
Abstract How can we train a statistical mixture model on a massive data set? In this paper, we show ...
Latent variable models provide a powerful framework for describing complex data by capturing its str...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as c...
International audienceWhen one is faced with a dataset too large to be used all at once, an obvious ...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
Bayesian coresets approximate a posterior distribution by building a small weighted subset of the da...
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many ...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtai...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
Abstract How can we train a statistical mixture model on a massive data set? In this paper, we show ...
Latent variable models provide a powerful framework for describing complex data by capturing its str...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
Motivated by practical generalizations of the classic $k$-median and $k$-means objectives, such as c...
International audienceWhen one is faced with a dataset too large to be used all at once, an obvious ...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
In the era of datasets of unprecedented sizes, data compression techniques are an attractive approac...
Bayesian coresets approximate a posterior distribution by building a small weighted subset of the da...
Coresets are among the most popular paradigms for summarizing data. In particular, there exist many ...
The coreset paradigm is a fundamental tool for analysing complex and large datasets. Although corese...
Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtai...