We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Multi-Armed bandit problem is a classic example of the exploration vs. exploitation dilemma in which...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision prob...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-a...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Multi-Armed bandit problem is a classic example of the exploration vs. exploitation dilemma in which...
We address the problem of online sequential decision making, i.e., balancing the trade-off between e...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision prob...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Top Two algorithms arose as an adaptation of Thompson sampling to best arm identification in multi-a...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
Multi-Armed bandit problem is a classic example of the exploration vs. exploitation dilemma in which...