We study a distributed multi-armed bandit setting among a population of $n$ memory-constrained nodes in the gossip model: at each round, every node locally adopts one of $m$ arms, observes a reward drawn from the arm's (adversarially chosen) distribution, and then communicates with a randomly sampled neighbor, exchanging information to determine its policy in the next round. We introduce and analyze several families of dynamics for this task that are decentralized: each node's decision is entirely local and depends only on its most recently obtained reward and that of the neighbor it sampled. We show a connection between the global evolution of these decentralized dynamics with a certain class of "zero-sum" multiplicative weights update alg...
International audienceThis paper examines the long-run behavior of learning with bandit feedback in ...
This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concav...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...
In this thesis, we study sequential multi-armed bandit and reinforcement learning in the federated s...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
We consider multi-armed bandit problems in social groups wherein each individual has bounded memory ...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network ...
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...
International audienceThe multi-armed bandit problem has attracted remarkable attention in the machi...
The uncoordinated spectrum access problem is studied using a multi-player multi-armed bandits framew...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
Population learning in dynamic economies has been traditionally studied in over-simplified settings ...
The two-armed bandit problem is a classical optimization problem where a decision maker sequentially...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
International audienceThis paper examines the long-run behavior of learning with bandit feedback in ...
This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concav...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...
In this thesis, we study sequential multi-armed bandit and reinforcement learning in the federated s...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
We consider multi-armed bandit problems in social groups wherein each individual has bounded memory ...
We consider a bandit problem where at any time, the decision maker can add new arms to her considera...
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network ...
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...
International audienceThe multi-armed bandit problem has attracted remarkable attention in the machi...
The uncoordinated spectrum access problem is studied using a multi-player multi-armed bandits framew...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
Population learning in dynamic economies has been traditionally studied in over-simplified settings ...
The two-armed bandit problem is a classical optimization problem where a decision maker sequentially...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
International audienceThis paper examines the long-run behavior of learning with bandit feedback in ...
This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concav...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...