International audienceWe consider the problem of asynchronous online combinatorial optimization on a network of communicating agents. At each time step, some of the agents are stochastically activated, requested to make a prediction, and the system pays the corresponding loss. Then, neighbors of active agents receive semi-bandit feedback and exchange some succinct local information. The goal is to minimize the network regret, defined as the difference between the cumulative loss of the predictions of active agents and that of the best action in hindsight, selected from a combinatorial decision set. The main challenge in such a context is to control the computational complexity of the resulting algorithm while retaining minimax optimal regre...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
The greedy algorithm is extensively studied in the field of combinatorial optimiza-tion for decades....
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...
International audienceThis paper investigates stochastic and adversarial combinatorial multi-armed b...
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network ...
Most work on sequential learning assumes a fixed set of actions that are available all the time. How...
We study an asynchronous online learning setting with a network of agents. At each time step, some o...
We address online linear optimization problems when the possible actions of the decision maker are r...
Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making probl...
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the...
We study the interplay between feedback and communication in a cooperative online learning setting w...
International audienceWe consider the problem of online combinatorial optimization under semi-bandit...
23 pagesInternational audienceWe address the online linear optimization problem when the actions of ...
Abstract—We formulate the following combinatorial multi-armed bandit (MAB) problem: There are random...
We study networks of communicating learning agents that cooperate to solve a common nonstochastic ba...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
The greedy algorithm is extensively studied in the field of combinatorial optimiza-tion for decades....
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...
International audienceThis paper investigates stochastic and adversarial combinatorial multi-armed b...
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network ...
Most work on sequential learning assumes a fixed set of actions that are available all the time. How...
We study an asynchronous online learning setting with a network of agents. At each time step, some o...
We address online linear optimization problems when the possible actions of the decision maker are r...
Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making probl...
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the...
We study the interplay between feedback and communication in a cooperative online learning setting w...
International audienceWe consider the problem of online combinatorial optimization under semi-bandit...
23 pagesInternational audienceWe address the online linear optimization problem when the actions of ...
Abstract—We formulate the following combinatorial multi-armed bandit (MAB) problem: There are random...
We study networks of communicating learning agents that cooperate to solve a common nonstochastic ba...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
The greedy algorithm is extensively studied in the field of combinatorial optimiza-tion for decades....
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...