An Optimal Algorithm for Linear Bandits

Nicolo ̀ Cesa-bianchi
Sham Kakade

Publication date

September 2016

Abstract

We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order Td lnN on any finite class X ⊆ Rd of N actions, and of order d T (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We also present an application to a model of linear bandits with expert advice. Interestingly, these results show that bandit linear optimization with expert advice in d dimensions is no more difficult (in terms of the achievable regret) than the online d-armed bandit problem with expert advice (where EXP4 is optimal). 1 Introduction and Related Work The problem of bandit lin...

Extracted data

We use cookies to provide a better user experience.

Data Protection

An Optimal Algorithm for Linear Bandits

Abstract

Extracted data

An Optimal Algorithm for Linear Bandits

Abstract

Extracted data

Related items

Related items