We consider an extension to the restless multi-armed bandit (RMAB) problem with unknown arm dynamics, where an unknown exogenous global Markov process governs the rewards distribution of each arm. Under each global state, the rewards process of each arm evolves according to an unknown Markovian rule, which is non-identical among different arms. At each time, a player chooses an arm out of N arms to play, and receives a random reward from a finite set of reward states. The arms are restless, that is, their local state evolves regardless of the player's actions. Motivated by recent studies on related RMAB settings, the regret is defined as the reward loss with respect to a player that knows the dynamics of the problem, and plays at each time ...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
We introduce robustness in \textit{restless multi-armed bandits} (RMABs), a popular model for constr...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to ...
International audienceWe consider the restless Markov bandit problem, in which the state of each arm...
The multi-armed restless bandit problem is studied in the case where the pay-off distributions are s...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
We consider a Latent Bandit problem where the latent state keeps changing in time according to an un...
We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed as R(MA)...
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by ap...
We provide a framework to analyse control policies for the restless Markovian bandit model, under bo...
In this paper, we propose a new multi-armed bandit problem called the Gambler's Ruin Bandit Problem ...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
We introduce robustness in \textit{restless multi-armed bandits} (RMABs), a popular model for constr...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to ...
International audienceWe consider the restless Markov bandit problem, in which the state of each arm...
The multi-armed restless bandit problem is studied in the case where the pay-off distributions are s...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
We consider a Latent Bandit problem where the latent state keeps changing in time according to an un...
We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed as R(MA)...
Multi-player multi-armed bandit is an increasingly relevant decision-making problem, motivated by ap...
We provide a framework to analyse control policies for the restless Markovian bandit model, under bo...
In this paper, we propose a new multi-armed bandit problem called the Gambler's Ruin Bandit Problem ...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...