Cheap Bandits

Hanawal, Manjesh Kumar,
Saligrama, Venkatesh
Valko, Michal
Munos, Rémi

Publication date

January 2015

Publisher

HAL CCSD

Abstract

International audienceWe consider stochastic sequential learning problems where the learner can observe the average reward of several actions. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications , it is actually cheaper to observe average reward of a group of actions rather than the reward of a single action. We show that when the reward is smooth over a given graph representing the neighboring actions, we can maximize the cumulative reward of learning while minimizing the sensing cost. In this paper we propose CheapUCB, an algorithm that matches the regret guarantees ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Cheap Bandits

Abstract

Extracted data

Cheap Bandits

Abstract

Extracted data

Related items

Related items