International audienceWe consider a generalization of stochastic bandits where the set of arms, $\cX$, is allowed to be a generic measurable space and the mean-payoff function is ''locally Lipschitz'' with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if $\cX$ is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally continuous with a known smoothness degree, then the expected reg...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
International audienceThis work addresses the problem of regret minimization in non-stochastic multi...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
International audienceWe consider a generalization of stochastic bandits where the set of arms, $\cX...
International audienceWe consider a generalization of stochastic bandit problems where the set of ar...
International audienceWe consider the setting of stochastic bandit problems with a continuum of arms...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceIn this paper we consider the problem of online stochastic optimization of a l...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
International audienceBandit algorithms are concerned with trading exploration with exploitation whe...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
International audienceThis work addresses the problem of regret minimization in non-stochastic multi...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
International audienceWe consider a generalization of stochastic bandits where the set of arms, $\cX...
International audienceWe consider a generalization of stochastic bandit problems where the set of ar...
International audienceWe consider the setting of stochastic bandit problems with a continuum of arms...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceIn this paper we consider the problem of online stochastic optimization of a l...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
International audienceBandit algorithms are concerned with trading exploration with exploitation whe...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
International audienceThis work addresses the problem of regret minimization in non-stochastic multi...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...