Optimal PAC Multiple Arm Identification with Applications to

Yuan Zhou
Xi Chen
Jian Li

Publication date

January 2014

Abstract

We study the problem of selecting K arms with the highest expected rewards in a stochastic N-armed bandit game. Instead of using existing evaluation metrics (e.g., misidentification probability (Bubeck et al., 2013) or the metric in Explore-K(Kalyanakrishnan & Stone, 2010)), we propose to use the aggregate regret, which is defined as the gap between the average reward of the optimal solution and that of our solution. Besides being a natural metric by itself, we argue that in many applications, such as our motivating example from the crowdsourcing, the aggregate regret bound is more suitable. We propose a new PAC algorithm, which, with probability at least 1−δ, identifies a set of K arms with regret at most . We provide a detailed analys...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Optimal PAC Multiple Arm Identification with Applications to

Abstract

Extracted data

Optimal PAC Multiple Arm Identification with Applications to

Abstract

Extracted data

Related items

Related items