We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed as R(MA)^2B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state and action of the corresponding MDP. Since finding the optimal policy is typically intractable, we propose a computationally appealing index policy entitled Occupancy-Measured-Reward Index Policy for the finite-horizon R(MA)^2B. Our index policy is well-defined without the requirement of indexability condition and is provably asymptotically optimal as the number of arms tends to infinity. We then adopt a learning perspective where the system parameters are unknown, and propose R(MA)^2B-...
We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm i...
We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm i...
Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notorious...
We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost ob...
155 pagesWe consider multi-action restless bandits with multiple resource constraints, also referred...
We provide a framework to analyse control policies for the restless Markovian bandit model, under bo...
We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (R...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to ...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
We investigate the optimal allocation of effort to a collection of n projects. The projects are &apo...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider a restless multiarmed bandit in which each arm can be in one of two states. When an arm ...
The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. ...
This article considers an important class of discrete time restless bandits, given by the discounted...
We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm i...
We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm i...
Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notorious...
We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost ob...
155 pagesWe consider multi-action restless bandits with multiple resource constraints, also referred...
We provide a framework to analyse control policies for the restless Markovian bandit model, under bo...
We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (R...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to ...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
We investigate the optimal allocation of effort to a collection of n projects. The projects are &apo...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider a restless multiarmed bandit in which each arm can be in one of two states. When an arm ...
The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. ...
This article considers an important class of discrete time restless bandits, given by the discounted...
We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm i...
We consider a restless bandit problem with Gaussian autoregressive arms, where the state of an arm i...
Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notorious...