AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit problem. The KG policy is based on a one-period look-ahead, which is known to underperform in other learning problems when the marginal value of information is non-concave. We present an adjustment that corrects for non-concavity and approximates a multi-step look-ahead, and compare its performance to the unadjusted KG policy and other heuristics. We provide guidance for determining when adjustment will improve performance, and when it is unnecessary. We present evidence suggesting that KG is generally robust in the multi-armed bandit setting, which argues in favour of KG as an alternative to index policies
Abstract. The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process wit...
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on m...
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochast...
AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit pro...
The Knowledge Gradient (KG) policy was originally proposed for online ranking and selection problems...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochasti...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We survey the literature on multi-armed bandit models and their applications in economics. The multi...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
Abstract. We compare well-known action selection policies used in reinforcement learning like ǫ-gree...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
AbstractA multi-armed bandit problem is a search problem on which a learning agent must select the o...
An empirical comparative study is made of a sample of action selection policies on a test suite of t...
Abstract. The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process wit...
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on m...
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochast...
AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit pro...
The Knowledge Gradient (KG) policy was originally proposed for online ranking and selection problems...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochasti...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We survey the literature on multi-armed bandit models and their applications in economics. The multi...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
Abstract. We compare well-known action selection policies used in reinforcement learning like ǫ-gree...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
AbstractA multi-armed bandit problem is a search problem on which a learning agent must select the o...
An empirical comparative study is made of a sample of action selection policies on a test suite of t...
Abstract. The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process wit...
This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on m...
The multi-objective, multi-armed bandits (MOMABs) problem is a Markov decision process with stochast...