155 pagesWe consider multi-action restless bandits with multiple resource constraints, also referred to as weakly coupled Markov decision processes. This problem is important in recommender systems, active learning, revenue management, and many other areas. An optimal policy can be theoretically found by solving a Markov decision process, but the computation required scales exponentially in the number of arms $N$. Thus, scalable approximate policies are important for problems with large $N$. We study the optimality gap, i.e., the loss in expected performance vs. that of the optimal policy, of such scalable policies. The tightest previous theoretical bounds, which apply only for a handful of carefully-designed policies, show that this optima...
We present a technique for computing approximately optimal solutions to stochastic resource allocati...
We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (R...
Abstract—We consider two variants of the standard multi-armed bandit problem, namely, the multi-arme...
We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed as R(MA)...
We study a resource allocation problem with varying requests and with resources of limited capacity ...
We provide a framework to analyse control policies for the restless Markovian bandit model, under bo...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Bandits are one of the most basic examples of decision-making with uncertainty. A Markovian restless...
Abstract—The multi-armed bandit problem and one of its most interesting extensions, the restless ban...
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on ...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
We investigate the optimal allocation of effort to a collection of n projects. The projects are &apo...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to ...
We consider a multiarmed bandit problem where the expected reward of each arm is a linear function o...
We present a technique for computing approximately optimal solutions to stochastic resource allocati...
We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (R...
Abstract—We consider two variants of the standard multi-armed bandit problem, namely, the multi-arme...
We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed as R(MA)...
We study a resource allocation problem with varying requests and with resources of limited capacity ...
We provide a framework to analyse control policies for the restless Markovian bandit model, under bo...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Bandits are one of the most basic examples of decision-making with uncertainty. A Markovian restless...
Abstract—The multi-armed bandit problem and one of its most interesting extensions, the restless ban...
We present new algorithms for reinforcement learning, and prove that they have polynomial bounds on ...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
We investigate the optimal allocation of effort to a collection of n projects. The projects are &apo...
We consider the restless Markov bandit problem, in which the state of each arm evolves according to ...
We consider a multiarmed bandit problem where the expected reward of each arm is a linear function o...
We present a technique for computing approximately optimal solutions to stochastic resource allocati...
We propose an asymptotically optimal heuristic, which we termed the Randomized Assignment Control (R...
Abstract—We consider two variants of the standard multi-armed bandit problem, namely, the multi-arme...