International audienceWe consider stochastic sequential learning problems where the learner can observe the average reward of several actions. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications , it is actually cheaper to observe average reward of a group of actions rather than the reward of a single action. We show that when the reward is smooth over a given graph representing the neighboring actions, we can maximize the cumulative reward of learning while minimizing the sensing cost. In this paper we propose CheapUCB, an algorithm that matches the regret guarantees ...
In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedbac...
International audienceWe consider adversarial multi-armed bandit problems where the learner is allow...
We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of ...
We consider stochastic sequential learning problems where the learner can observe the \textit{averag...
International audienceWe consider stochastic sequential learning problems where the learner can obse...
The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively stu...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
We investigate the structural properties of certain sequential decision-making problems with limited...
International audienceWe study a graph bandit setting where the objective of the learner is to detec...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\th...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedbac...
International audienceWe consider adversarial multi-armed bandit problems where the learner is allow...
We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of ...
We consider stochastic sequential learning problems where the learner can observe the \textit{averag...
International audienceWe consider stochastic sequential learning problems where the learner can obse...
The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively stu...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
We investigate the structural properties of certain sequential decision-making problems with limited...
International audienceWe study a graph bandit setting where the objective of the learner is to detec...
International audienceSmooth functions on graphs have wide applications in manifold and semi-supervi...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We consider a linear stochastic bandit problem where the dimension $K$ of the unknown parameter $\th...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
In sparse linear bandits, a learning agent sequentially selects an action and receive reward feedbac...
International audienceWe consider adversarial multi-armed bandit problems where the learner is allow...
We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of ...