AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit problem. The KG policy is based on a one-period look-ahead, which is known to underperform in other learning problems when the marginal value of information is non-concave. We present an adjustment that corrects for non-concavity and approximates a multi-step look-ahead, and compare its performance to the unadjusted KG policy and other heuristics. We provide guidance for determining when adjustment will improve performance, and when it is unnecessary. We present evidence suggesting that KG is generally robust in the multi-armed bandit setting, which argues in favour of KG as an alternative to index policies
In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward cons...
An empirical comparative study is made of a sample of action selection policies on a test suite of t...
This article considers the use of Bayesian optimization to identify robust solutions, where robust m...
AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit pro...
The Knowledge Gradient (KG) policy was originally proposed for online ranking and selection problems...
A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochasti...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
Abstract. We compare well-known action selection policies used in reinforcement learning like ǫ-gree...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochast...
Abstract. The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process wit...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward cons...
An empirical comparative study is made of a sample of action selection policies on a test suite of t...
This article considers the use of Bayesian optimization to identify robust solutions, where robust m...
AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit pro...
The Knowledge Gradient (KG) policy was originally proposed for online ranking and selection problems...
A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochasti...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
Abstract. We compare well-known action selection policies used in reinforcement learning like ǫ-gree...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochast...
Abstract. The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process wit...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward cons...
An empirical comparative study is made of a sample of action selection policies on a test suite of t...
This article considers the use of Bayesian optimization to identify robust solutions, where robust m...