International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in the stochastic linear bandit setting. While we obtain a regret bound of order $\wt{O}(d^{3/2}\sqrt{T})$ as in previous results, the proof sheds new light on the functioning of the \ts. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to \textit{optimistic} parameters does control it. Thus we show that \ts can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrt{d}$ regret factor compared to a U...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
International audienceWe consider the exploration-exploitation tradeoff in linear quadratic (LQ) con...
International audienceIn this paper we consider Thompson Sampling (TS) for combinatorial semi-bandit...
Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimizatio...
© 2019 International Joint Conferences on Artificial Intelligence. All rights reserved. Thompson Sam...
In the classical stochastic k-armed bandit problem, in each of a sequence of rounds, a decision make...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceWe derive an alternative proof for the regret of Thompson sampling (\ts) in th...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
This dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to add...
International audienceWe consider the exploration-exploitation tradeoff in linear quadratic (LQ) con...
International audienceIn this paper we consider Thompson Sampling (TS) for combinatorial semi-bandit...
Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework for regret minimizatio...
© 2019 International Joint Conferences on Artificial Intelligence. All rights reserved. Thompson Sam...
In the classical stochastic k-armed bandit problem, in each of a sequence of rounds, a decision make...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...
International audienceStochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple framework ...