Algorithms for the multi-armed bandit problem

Volodymyr Kuleshov
Doina Precup
Pack Kaelbling

Publication date

January 2010

Abstract

The stochastic multi-armed bandit problem is an important model for studying the exploration-exploitation tradeoff in reinforcement learning. Although many algorithms for the problem are well-understood theoretically, empirical confirmation of their effectiveness is generally scarce. This paper presents a thorough empirical study of the most popular multi-armed bandit algorithms. Three important observations can be made from our results. Firstly, simple heuristics such as -greedy and Boltzmann exploration outperform theoretically sound algorithms on most settings by a significant margin. Secondly, the performance of most algorithms varies dramatically with the parameters of the bandit problem. Our study identifies for each algorithm the set...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Algorithms for the multi-armed bandit problem

Abstract

Extracted data

Algorithms for the multi-armed bandit problem

Abstract

Extracted data

Related items

Related items