Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore the use of coresets – a data summariza-tion technique originating from computational geometry – for this task. Coresets are weighted subsets of the data such that models trained on these coresets are provably competitive with models trained on the full dataset. Coresets sublinear in the dataset size allow for fast approximate inference with provable guarantees. Existing constructions, however, are limited to parametric problems. Using novel techniques in coreset construction we show the existence of coresets for DP-Means – a prototypical nonparametric clustering problem – and provide a practical construction algorithm. We empiri-cally demonst...
One desirable property of machine learning algorithms is the ability to balance the number of p...
Practical applications of Bayesian nonparamet-ric (BNP) models have been limited, due to their high ...
<p>Capturing high dimensional complex ensembles of data is becoming commonplace in a variety of appl...
Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We...
Abstract How can we train a statistical mixture model on a massive data set? In this paper, we show ...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Latent variable models provide a powerful framework for describing complex data by capturing its str...
We describe an adaptation of the simulated annealing algorithm to nonparametric clus-tering and rela...
Subspace clustering separates data points ap-proximately lying on union of affine subspaces into sev...
Bayesian non-parametric models, despite their theoretical elegance, face a serious computational bur...
This thesis studies clustering problems on data streams, specifically with applications to metric sp...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
One desirable property of machine learning algorithms is the ability to balance the number of p...
Practical applications of Bayesian nonparamet-ric (BNP) models have been limited, due to their high ...
<p>Capturing high dimensional complex ensembles of data is becoming commonplace in a variety of appl...
Abstract Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We...
Abstract How can we train a statistical mixture model on a massive data set? In this paper, we show ...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
The massive growth of modern datasets from different sources such as videos, social networks, and se...
Latent variable models provide a powerful framework for describing complex data by capturing its str...
We describe an adaptation of the simulated annealing algorithm to nonparametric clus-tering and rela...
Subspace clustering separates data points ap-proximately lying on union of affine subspaces into sev...
Bayesian non-parametric models, despite their theoretical elegance, face a serious computational bur...
This thesis studies clustering problems on data streams, specifically with applications to metric sp...
We study the problem of constructing coresets for clustering problems with time series data. This pr...
The k-means problem seeks a clustering that minimizes the sum of squared errors cost function: For i...
How can we train a statistical mixture model on a massive data set? In this work we show how to cons...
One desirable property of machine learning algorithms is the ability to balance the number of p...
Practical applications of Bayesian nonparamet-ric (BNP) models have been limited, due to their high ...
<p>Capturing high dimensional complex ensembles of data is becoming commonplace in a variety of appl...