We consider a bandit problem where at any time, the decision maker can add new arms to her consideration set. A new arm is queried at a cost from an "arm-reservoir" containing finitely many "arm-types," each characterized by a distinct mean reward. The cost of query reflects in a diminishing probability of the returned arm being optimal, unbeknown to the decision maker; this feature encapsulates defining characteristics of a broad class of operations-inspired online learning problems, e.g., those arising in markets with churn, or those involving allocations subject to costly resource acquisition. The decision maker's goal is to maximize her cumulative expected payoffs over a sequence of n pulls, oblivious to the statistical properties as we...
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm...
International audienceWe consider a variant of the stochastic multi-armed bandit with K arms where t...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
In this thesis, we study strategies for sequential resource allocation, under the so-called stochast...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
International audienceWe consider multi-armed bandit problems where the number of arms is larger tha...
We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of ...
International audienceWe consider the problem of near-optimal arm identification in the fixed confid...
We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the...
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p∗ for the...
The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. I...
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm...
International audienceWe consider a variant of the stochastic multi-armed bandit with K arms where t...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
International audienceWe consider a stochastic bandit problem with infinitely many arms. In this set...
In this thesis, we study strategies for sequential resource allocation, under the so-called stochast...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
International audienceWe consider multi-armed bandit problems where the number of arms is larger tha...
We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of ...
International audienceWe consider the problem of near-optimal arm identification in the fixed confid...
We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the...
We consider a stochastic bandit problem with a possibly infinite number of arms. We write p∗ for the...
The multi-armed bandit (MAB) problem is a classic example of the exploration-exploitation dilemma. I...
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm...
International audienceWe consider a variant of the stochastic multi-armed bandit with K arms where t...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...