Infinite horizon multi-armed bandits with reward vectors:exploration/exploitation trade-off

Drugan, MM Madalina

Publication date

January 2015

Publisher

Springer Fachmedien Wiesbaden GmbH

Abstract

\u3cp\u3eWe focus on the effect of the exploration/exploitation tradeoff strategies on the algorithmic design off multi-armed bandits (MAB) with reward vectors. Pareto dominance relation assesses the quality of reward vectors in infinite horizon MABs, like the UCB1 and UCB2 algorithms. In single objective MABs, there is a trade-off between the exploration of the suboptimal arms, and exploitation of a single optimal arm. Pareto dominance based MABs fairly exploit all Pareto optimal arms, and explore suboptimal arms. We study the exploration vs exploitation trade-off for two UCB like algorithms for reward vectors. We analyse the properties of the proposed MAB algorithms in terms of upper regret bounds and we experimentally compare their explo...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Infinite horizon multi-armed bandits with reward vectors:exploration/exploitation trade-off

Abstract

Extracted data

Infinite horizon multi-armed bandits with reward vectors:exploration/exploitation trade-off

Abstract

Extracted data

Related items

Related items