This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log(n). They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and ...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
Abstract. This paper studies the deviations of the regret in a stochastic multi-armed bandit problem...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and ...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
Abstract. This paper studies the deviations of the regret in a stochastic multi-armed bandit problem...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
We consider the framework of stochastic multi-armed bandit problems and study the possibilities and ...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...