Near-Optimality for Multi-action Multi-resource Restless Bandits with Many Arms

Zhang, Xiangyu

Publication date

August 2022

DOI

Abstract

155 pagesWe consider multi-action restless bandits with multiple resource constraints, also referred to as weakly coupled Markov decision processes. This problem is important in recommender systems, active learning, revenue management, and many other areas. An optimal policy can be theoretically found by solving a Markov decision process, but the computation required scales exponentially in the number of arms $N$. Thus, scalable approximate policies are important for problems with large $N$. We study the optimality gap, i.e., the loss in expected performance vs. that of the optimal policy, of such scalable policies. The tightest previous theoretical bounds, which apply only for a handful of carefully-designed policies, show that this optima...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Near-Optimality for Multi-action Multi-resource Restless Bandits with Many Arms

Abstract

Extracted data

Near-Optimality for Multi-action Multi-resource Restless Bandits with Many Arms

Abstract

Extracted data

Related items

Related items