Risk-aversion in multi-armed bandits

Amir Sani
Alessandro Lazaric
Rémi Munos

Publication date

January 2012

Abstract

Stochastic multi–armed bandits solve the Exploration–Exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical prob-lems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk–aversion where the objective is to compete against the arm with the best risk–return trade–off. This setting proves to be intrinsically more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associ-ated to the variability of an algorithm. Using variance as a measure of risk, we introduce two new algorithms, investigate their theoretical guarantees, and report preliminary empirica...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Risk-aversion in multi-armed bandits

Abstract

Extracted data

Risk-aversion in multi-armed bandits

Abstract

Extracted data

Related items

Related items