We consider a bandit problem which involves sequential sampling from two populations (arms). Each arm produces a noisy reward realization which depends on an observable random covariate. The goal is to maximize cumulative expected reward. We derive general lower bounds on the performance of any admissible policy, and develop an algorithm whose performance achieves the order of said lower bound up to logarithmic terms. This is done by decomposing the global problem into suitably “localized ” bandit problems. Proofs blend ideas from nonparametric statistics and traditional methods used in the bandit literature
Published version of an article from Lecture Notes in Computer Science. Also available at SpringerLi...
We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) frame...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
We study a multi-armed bandit problem in a setting where covariates are available. We take a nonpara...
Consider a Bayesian sequential allocation problem that incorporates a covariate. The goal is to maxi...
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realiza...
Multi-armed bandit problem is an important optimization game that requires an exploration-exploitati...
We consider a multiarmed bandit problem where the expected reward of each arm is a linear function o...
Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leavi...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
University of Minnesota Ph.D. dissertation. July 2014. Major: Statistics. Advisor: Yuhong Yang. 1 co...
We study the problem of estimating the largest gain of an unknown linear and time-invariant filter, ...
This document presents in a unified way different results about the optimal solution of several mult...
Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leavi...
In the consideration of bandit problems with general rewards and discount sequences, we compare an a...
Published version of an article from Lecture Notes in Computer Science. Also available at SpringerLi...
We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) frame...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
We study a multi-armed bandit problem in a setting where covariates are available. We take a nonpara...
Consider a Bayesian sequential allocation problem that incorporates a covariate. The goal is to maxi...
We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realiza...
Multi-armed bandit problem is an important optimization game that requires an exploration-exploitati...
We consider a multiarmed bandit problem where the expected reward of each arm is a linear function o...
Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leavi...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
University of Minnesota Ph.D. dissertation. July 2014. Major: Statistics. Advisor: Yuhong Yang. 1 co...
We study the problem of estimating the largest gain of an unknown linear and time-invariant filter, ...
This document presents in a unified way different results about the optimal solution of several mult...
Existing Bayesian nonparametric methodologies for bandit problems focus on exact observations, leavi...
In the consideration of bandit problems with general rewards and discount sequences, we compare an a...
Published version of an article from Lecture Notes in Computer Science. Also available at SpringerLi...
We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) frame...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...