Taming the monster: A fast and simple algorithm for contextual bandits

Alekh Agarwal
Daniel Hsu
Satyen Kale
John Langford
Lihong Li
Robert E. Schapire

Publication date

January 2014

Abstract

We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly takes one of K actions in response to the observed context, and observes the reward only for that chosen action. Our method assumes access to an oracle for solving fully supervised cost-sensitive classification problems and achieves the statistically optimal regret guarantee with only Õ( KT) or-acle calls across all T rounds. By doing so, we obtain the most practical contextual bandit learning algorithm amongst approaches that work for general policy classes. We further conduct a proof-of-concept experiment which demonstrates the excellent computational and prediction performance of (an online variant of) our algorithm relative to several b...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Taming the monster: A fast and simple algorithm for contextual bandits

Abstract

Extracted data

Taming the monster: A fast and simple algorithm for contextual bandits

Abstract

Extracted data

Related items

Related items