We propose a general method for distributed Bayesian model choice, using the marginal likelihood, where a data set is split in non-overlapping subsets. These subsets are only accessed locally by individual workers and no data is shared between the workers. We approximate the model evidence for the full data set through Monte Carlo sampling from the posterior on every subset generating a model evidence per subset. The results are combined using a novel approach which corrects for the splitting using summary statistics of the generated samples. Our divide-and-conquer approach enables Bayesian model choice in the large data setting, exploiting all available information but limiting communication between workers. We derive theoretical error bou...
In the last decade or so, there has been a dramatic increase in storage facilities and the possibili...
Many Bayesian learning methods for massive data benefit from working with small subsets of observati...
This dissertation explores Bayesian model selection and estimation in settings where the model space...
To conduct Bayesian inference with large data sets, it is often convenient or necessary to distribut...
URL to accepted papers on conference siteThis paper presents an approximate method for performing Ba...
This paper studies distributed Bayesian learning in a setting encompassing a central server and mult...
A useful definition of ‘big data’ is data that is too big to process comfortably on a single machine...
Motivated by the need to analyze large, decentralized datasets, distributed Bayesian inference has b...
While modern machine learning and deep learning seem to dominate the areas where scalability and mod...
Multilevel models are extremely useful in handling large hierarchical datasets. However, computation...
Bayesian nonparametric based models are an elegant way for discovering underlying latent features wi...
AbstractThis paper addresses the issue of designing an effective distributed learning system in whic...
Combining several (sample approximations of) distributions, which we term sub-posteriors, into a sin...
Simulator-based models are models for which the likelihood is intractable but simulation of syntheti...
Mutual independence is a key concept in statistics that characterizes the structural relationships b...
In the last decade or so, there has been a dramatic increase in storage facilities and the possibili...
Many Bayesian learning methods for massive data benefit from working with small subsets of observati...
This dissertation explores Bayesian model selection and estimation in settings where the model space...
To conduct Bayesian inference with large data sets, it is often convenient or necessary to distribut...
URL to accepted papers on conference siteThis paper presents an approximate method for performing Ba...
This paper studies distributed Bayesian learning in a setting encompassing a central server and mult...
A useful definition of ‘big data’ is data that is too big to process comfortably on a single machine...
Motivated by the need to analyze large, decentralized datasets, distributed Bayesian inference has b...
While modern machine learning and deep learning seem to dominate the areas where scalability and mod...
Multilevel models are extremely useful in handling large hierarchical datasets. However, computation...
Bayesian nonparametric based models are an elegant way for discovering underlying latent features wi...
AbstractThis paper addresses the issue of designing an effective distributed learning system in whic...
Combining several (sample approximations of) distributions, which we term sub-posteriors, into a sin...
Simulator-based models are models for which the likelihood is intractable but simulation of syntheti...
Mutual independence is a key concept in statistics that characterizes the structural relationships b...
In the last decade or so, there has been a dramatic increase in storage facilities and the possibili...
Many Bayesian learning methods for massive data benefit from working with small subsets of observati...
This dissertation explores Bayesian model selection and estimation in settings where the model space...