This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms’ expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our al...
International audienceWe consider a variant of the stochastic multi-armed bandit with K arms where t...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are ...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm...
Abstract—In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where...
Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making probl...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
International audienceWe consider a generalization of stochastic bandits where the set of arms, $\cX...
International audienceIn this paper we consider the problem of online stochastic optimization of a l...
International audienceWe consider a generalization of stochastic bandit problems where the set of ar...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider a variant of the stochastic multi-armed bandit with K arms where t...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are ...
This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selectio...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We introduce and analyze a best arm identification problem in the rested bandit setting, wherein arm...
Abstract—In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where...
Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making probl...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
International audienceWe consider a generalization of stochastic bandits where the set of arms, $\cX...
International audienceIn this paper we consider the problem of online stochastic optimization of a l...
International audienceWe consider a generalization of stochastic bandit problems where the set of ar...
International audienceWe consider the problem of finding the best arm in a stochastic multi-armed ba...
International audienceWe consider a variant of the stochastic multi-armed bandit with K arms where t...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider a non-stationary formulation of the stochastic multi-armed bandit where the rewards are ...