Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called “scratch-games”, where arm budgets are limited and reward are drawn without replacement. Using Serfling inequality, we propose an upper confidence bound algorithm adapted to this setting. We show that the bound of expectation to play a suboptimal arm is lower than the one of UCB1 policy. We illustrate this result on both synthetic problems and realistic problems (ad-serving and emailing campaigns optimization)
The bandit problem models a sequential decision process between a player and an environment. In the ...
We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem....
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
We consider a stochastic linear bandit problem in which the rewards are not only subject to random n...
International audienceThis paper considers the problem of maximizing an expectation function over a ...
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stoc...
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the...
We consider multi-armed bandit problems where the number of arms is larger than the possible number ...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
AbstractA multi-armed bandit problem is a search problem on which a learning agent must select the o...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
10+18 pages.International audienceWe study reward maximisation in a wide class of structured stochas...
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the l...
International audienceThis paper introduces and addresses a wide class of stochastic bandit problems...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
The bandit problem models a sequential decision process between a player and an environment. In the ...
We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem....
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
We consider a stochastic linear bandit problem in which the rewards are not only subject to random n...
International audienceThis paper considers the problem of maximizing an expectation function over a ...
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stoc...
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the...
We consider multi-armed bandit problems where the number of arms is larger than the possible number ...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
AbstractA multi-armed bandit problem is a search problem on which a learning agent must select the o...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
10+18 pages.International audienceWe study reward maximisation in a wide class of structured stochas...
The paper proposes a novel upper confidence bound (UCB) procedure for identifying the arm with the l...
International audienceThis paper introduces and addresses a wide class of stochastic bandit problems...
The stochastic multi-armed bandit problem is an important model for studying the exploration-exploit...
The bandit problem models a sequential decision process between a player and an environment. In the ...
We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem....
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...