Multi-armed bandit is a widely-studied model for sequential decision-making problems. The most studied model in the literature is stochastic bandits wherein the reward of each arm follows an independent distribution. However, there is a wide range of applications where the rewards of different alternatives are correlated to some extent. In this paper, a class of structured bandit problems is studied in which rewards of different arms are functions of the same unknown parameter vector. To minimize the cumulative learning regret, we propose a globally informative Thompson sampling algorithm to learn and leverage the correlation among arms, which can deal with unknown multidimensional parameter and non-monotonic reward functions. Our studies d...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
Part 5: Machine LearningInternational audienceThe multi-armed bandit problem has been studied for de...
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision prob...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-arme...
We consider stochastic multi-armed bandit prob-lems with complex actions over a set of basic arms, w...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
In this work, we address the combinatorial optimization problem in the stochastic bandit setting wit...
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
International audienceThe stochastic multi-armed bandit problem is a popular model of the exploratio...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
Part 5: Machine LearningInternational audienceThe multi-armed bandit problem has been studied for de...
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision prob...
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We study, to the best of our knowledge, the first Bayesian algorithm for unimodal Multi-Armed Bandit...
Presented at the Thirty-eighth International Conference on Machine Learning (ICML 2021)International...
We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-arme...
We consider stochastic multi-armed bandit prob-lems with complex actions over a set of basic arms, w...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We address multi-armed bandits (MAB) where the objective is to maximize the cumulative reward under ...
In this work, we address the combinatorial optimization problem in the stochastic bandit setting wit...
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
International audienceThe stochastic multi-armed bandit problem is a popular model of the exploratio...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...
A challenging aspect of the bandit problem is that a stochastic reward is observed only for the chos...
Part 5: Machine LearningInternational audienceThe multi-armed bandit problem has been studied for de...
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision prob...