International audienceIn this paper we consider Thompson Sampling (TS) for combinatorial semi-bandits. We demonstrate that, perhaps surprisingly, TS is sub-optimal for this problem in the sense that its regret scales exponentially in the ambient dimension, and its minimax regret scales almost linearly. This phenomenon occurs under a wide variety of assumptions including both non-linear and linear reward functions, with Bernoulli distributed rewards and uniform priors. We also show that including a fixed amount of forced exploration to TS does not alleviate the problem. We complement our theoretical results with numerical results and show that in practice TS indeed can perform very poorly in some high dimensional situations
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceWe investigate stochastic combinatorial multi-armed bandit with semi-bandit fe...
International audienceWe investigate stochastic combinatorial multi-armed bandit with semi-bandit fe...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
We consider stochastic multi-armed bandit prob-lems with complex actions over a set of basic arms, w...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable cl...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceWe investigate stochastic combinatorial multi-armed bandit with semi-bandit fe...
International audienceWe investigate stochastic combinatorial multi-armed bandit with semi-bandit fe...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
We consider stochastic multi-armed bandit prob-lems with complex actions over a set of basic arms, w...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...
We discuss a variant of Thompson sampling for nonparametric reinforcement learning in a countable cl...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...