We propose a tree-based procedure inspired by the Monte-Carlo Tree Search that dynamically modulates an importance-based sampling to prioritize computation, while getting unbiased estimates of weighted sums. We apply this generic method to learning on very large training sets, and to the evaluation of large-scale SVMs. The core idea is to reformulate the estimation of a score - whether a loss or a prediction estimate - as an empirical expectation, and to use such a tree whose leaves carry the samples to focus efforts over the problematic "heavy weight" ones. We illustrate the potential of this approach on three problems: to improve Adaboost and a multi-layer perceptron on 2D synthetic tasks with several million points, to train a large-scal...
We propose a Monte Carlo algorithm to sample from high dimensional probability distributions that co...
Generating low-rank approximations of kernel matrices that arise in nonlinear machine learning techn...
MOTIVATION: Random forests are fast, flexible and represent a robust approach to analyze high dimens...
Computing expectations in high-dimensional spaces is a key challenge in probabilistic infer-ence and...
Abstract I present a simple variation of importance sampling that explicitly search-es for important...
Importance sampling is often used in machine learning when training and testing data come from diffe...
Abstract This thesis consists of two papers related to large deviation results associated with impor...
171 pagesMachine learning has become ubiquitous in many areas, including high-stake applications suc...
In this thesis, we leverage powerful statistical frameworks for optimal sequential estimation and tr...
Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Autho...
12 pagesFor Bayesian computation in big data contexts, the divide-and-conquer MCMC concept splits th...
Many statistical problems involve the learning of an importance/effect of a variable for predicting ...
In this paper we introduce a new dynamic importance sampling propagation algorithm for Bayesian netw...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
This thesis consists of four papers, presented in Chapters 2-5, on the topics large deviations and s...
We propose a Monte Carlo algorithm to sample from high dimensional probability distributions that co...
Generating low-rank approximations of kernel matrices that arise in nonlinear machine learning techn...
MOTIVATION: Random forests are fast, flexible and represent a robust approach to analyze high dimens...
Computing expectations in high-dimensional spaces is a key challenge in probabilistic infer-ence and...
Abstract I present a simple variation of importance sampling that explicitly search-es for important...
Importance sampling is often used in machine learning when training and testing data come from diffe...
Abstract This thesis consists of two papers related to large deviation results associated with impor...
171 pagesMachine learning has become ubiquitous in many areas, including high-stake applications suc...
In this thesis, we leverage powerful statistical frameworks for optimal sequential estimation and tr...
Thesis (Ph.D.)--Boston University PLEASE NOTE: Boston University Libraries did not receive an Autho...
12 pagesFor Bayesian computation in big data contexts, the divide-and-conquer MCMC concept splits th...
Many statistical problems involve the learning of an importance/effect of a variable for predicting ...
In this paper we introduce a new dynamic importance sampling propagation algorithm for Bayesian netw...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
This thesis consists of four papers, presented in Chapters 2-5, on the topics large deviations and s...
We propose a Monte Carlo algorithm to sample from high dimensional probability distributions that co...
Generating low-rank approximations of kernel matrices that arise in nonlinear machine learning techn...
MOTIVATION: Random forests are fast, flexible and represent a robust approach to analyze high dimens...