8International audienceWe consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over $[0,1]$. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and until the first $m$ failures, respectively, where $m$ is a fixed parameter. This two-target algorithm achieves a long-term average regret in $\sqrt{2n}$ for a large parameter $m$ and a known time horizon $n$. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in $2\sqrt{n}$. The results are...
International audienceThis paper investigates stochastic and adversarial combinatorial multi-armed b...
AbstractThe standard Bernoulli two-armed bandit model is modified by terminating the choice problem ...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independen...
Abstract — In this paper, we consider the problem of multi-armed bandits with a large, possibly infi...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernou...
International audienceWe consider multi-armed bandit problems where the number of arms is larger tha...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of ...
We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of ...
In this thesis, we study strategies for sequential resource allocation, under the so-called stochast...
International audienceThis paper investigates stochastic and adversarial combinatorial multi-armed b...
AbstractThe standard Bernoulli two-armed bandit model is modified by terminating the choice problem ...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independen...
Abstract — In this paper, we consider the problem of multi-armed bandits with a large, possibly infi...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
We consider a bandit problem consisting of a sequence of n choices from an infinite number of Bernou...
International audienceWe consider multi-armed bandit problems where the number of arms is larger tha...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of ...
We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of ...
In this thesis, we study strategies for sequential resource allocation, under the so-called stochast...
International audienceThis paper investigates stochastic and adversarial combinatorial multi-armed b...
AbstractThe standard Bernoulli two-armed bandit model is modified by terminating the choice problem ...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...