We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a network to minimize regret but $m$ malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur $O( (m + K/n) \log (T) / \Delta )$ regret in this setting, where $K$ is the number of arms and $\Delta$ is the arm gap. For $m \ll K$, this improves over the single-agent baseline regret of $O(K\log(T)/\Delta)$. In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in $K$ and $n$....
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
We study networks of communicating learning agents that cooperate to solve a common nonstochastic ba...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...
The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively stu...
We consider stochastic multi-armed bandits where the expected reward is a unimodal func-tion over pa...
Multiplayer bandits have recently been extensively studied because of their application to cognitive...
We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can o...
We study networks of communicating learning agents that cooperate to solve a common nonstochastic ba...
We consider running multiple instances of multi-armed bandit (MAB) problems in parallel. A main moti...
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network ...
We study a structured variant of the multi-armed bandit problem specified by a set of Bernoulli dist...
Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
We study networks of communicating learning agents that cooperate to solve a common nonstochastic ba...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...
The multi-armed bandit(MAB) problem is a simple yet powerful framework that has been extensively stu...
We consider stochastic multi-armed bandits where the expected reward is a unimodal func-tion over pa...
Multiplayer bandits have recently been extensively studied because of their application to cognitive...
We define and analyze a multi-agent multi-armed bandit problem in which decision-making agents can o...
We study networks of communicating learning agents that cooperate to solve a common nonstochastic ba...
We consider running multiple instances of multi-armed bandit (MAB) problems in parallel. A main moti...
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network ...
We study a structured variant of the multi-armed bandit problem specified by a set of Bernoulli dist...
Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
We consider the partial observability model for multi-armed bandits, introduced by Mannor and Shamir...
This thesis studies several extensions of multi-armed bandit problem, where a learner sequentially s...
We study networks of communicating learning agents that cooperate to solve a common nonstochastic ba...