We consider stochastic sequential learning problems where the learner can observe the \textit{average reward of several actions}. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications, it is actually \textit{cheaper} to observe average reward of a group of actions rather than the reward of a single action. We show that when the reward is \textit{smooth} over a given graph representing the neighboring actions, we can maximize the cumulative reward of learning while \textit{minimizing the sensing cost}. In this paper we propose CheapUCB, an algorithm that matches the regre...
In this paper, we investigate the stochastic contextual bandit with general function space and graph...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
This manuscript deals with the estimation of the optimal rule and its meanreward in a simple ban...
International audienceWe consider stochastic sequential learning problems where the learner can obse...
The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively stu...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
We investigate the structural properties of certain sequential decision-making problems with limited...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
International audienceWe study a graph bandit setting where the objective of the learner is to detec...
In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedbac...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\th...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
In this paper, we investigate the stochastic contextual bandit with general function space and graph...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
This manuscript deals with the estimation of the optimal rule and its meanreward in a simple ban...
International audienceWe consider stochastic sequential learning problems where the learner can obse...
The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively stu...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
We investigate the structural properties of certain sequential decision-making problems with limited...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
International audienceWe study a graph bandit setting where the objective of the learner is to detec...
In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedbac...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\th...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
In this paper, we investigate the stochastic contextual bandit with general function space and graph...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
This manuscript deals with the estimation of the optimal rule and its meanreward in a simple ban...