Practical applications of Bayesian nonparamet-ric (BNP) models have been limited, due to their high computational complexity and poor scal-ing on large data. In this paper, we consider dependent nonparametric trees (DNTs), a pow-erful infinite model that captures time-evolving hierarchies, and develop a large-scale distribut-ed training system. Our major contributions in-clude: (1) an effective memoized variational in-ference for DNTs, with a novel birth-merge s-trategy for exploring the unbounded tree space; (2) a model-parallel scheme for concurrent tree growing/pruning and efficient model alignmen-t, through conflict-free model partitioning and lightweight synchronization; (3) a data-parallel scheme for variational parameter updates that...
Latent variable models provide a powerful framework for describing complex data by capturing its str...
We present a scheme for fast, distributed learn-ing on big (i.e. high-dimensional) models ap-plied t...
Appropriate tools for managing large-scale data, like online texts, images and user pro-files, are b...
Hierarchical clustering methods offer an intuitive and powerful way to model a wide variety of data ...
We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet pro...
One desirable property of machine learning algorithms is the ability to balance the number of p...
The availability of complex-structured data has sparked new research directions in statistics and ma...
In the Bayesian nonparametric family, Dirichlet Process (DP) is a prior distribution that is able to...
Bayesian networks (BNs) are highly practical and successful tools for modeling probabilistic knowled...
<p>Regularized Multinomial Logistic regression has emerged as one of the most common methods for per...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
Abstract—In this paper we introduce the Pitman Yor Diffusion Tree (PYDT), a Bayesian non-parametric ...
This paper considers a parallel algorithm for Bayesian network structure learning from large data se...
<p>Capturing high dimensional complex ensembles of data is becoming commonplace in a variety of appl...
This paper presents a methodology for creating streaming, distributed inference algorithms for Bayes...
Latent variable models provide a powerful framework for describing complex data by capturing its str...
We present a scheme for fast, distributed learn-ing on big (i.e. high-dimensional) models ap-plied t...
Appropriate tools for managing large-scale data, like online texts, images and user pro-files, are b...
Hierarchical clustering methods offer an intuitive and powerful way to model a wide variety of data ...
We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet pro...
One desirable property of machine learning algorithms is the ability to balance the number of p...
The availability of complex-structured data has sparked new research directions in statistics and ma...
In the Bayesian nonparametric family, Dirichlet Process (DP) is a prior distribution that is able to...
Bayesian networks (BNs) are highly practical and successful tools for modeling probabilistic knowled...
<p>Regularized Multinomial Logistic regression has emerged as one of the most common methods for per...
Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore ...
Abstract—In this paper we introduce the Pitman Yor Diffusion Tree (PYDT), a Bayesian non-parametric ...
This paper considers a parallel algorithm for Bayesian network structure learning from large data se...
<p>Capturing high dimensional complex ensembles of data is becoming commonplace in a variety of appl...
This paper presents a methodology for creating streaming, distributed inference algorithms for Bayes...
Latent variable models provide a powerful framework for describing complex data by capturing its str...
We present a scheme for fast, distributed learn-ing on big (i.e. high-dimensional) models ap-plied t...
Appropriate tools for managing large-scale data, like online texts, images and user pro-files, are b...