International audienceThis paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. [2] exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log n. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. [3]. This work first answers an open question: it extends this negative result to any anytime policy (i.e. any policy that does not take the number of plays n into account). Another contribution of this paper is to design robust anytime policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible di...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
We design new policies that ensure both worst-case optimality for expected regret and light-tailed r...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
Abstract. This paper studies the deviations of the regret in a stochastic multi-armed bandit problem...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
International audienceThis paper is devoted to regret lower bounds in the classical model of stochas...
International audienceThis paper is devoted to regret lower bounds in the classical model of stochas...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
We design new policies that ensure both worst-case optimality for expected regret and light-tailed r...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
International audienceThis paper studies the deviations of the regret in a stochastic multi-armed ba...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
Abstract. This paper studies the deviations of the regret in a stochastic multi-armed bandit problem...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
International audienceThis paper is devoted to regret lower bounds in the classical model of stochas...
International audienceThis paper is devoted to regret lower bounds in the classical model of stochas...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
We design new policies that ensure both worst-case optimality for expected regret and light-tailed r...