We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a central server to minimize regret. A fraction $\alpha$ of these agents are adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it can also disrupt the process of learning due to adversaries. In this work, we provide a fundamental understanding of this tension by designing new algorithms that balance the exploration-exploitation trade-off via carefully constructed robust confidence intervals. We also complement our algorithms with tight analyses. First, we develop a robust collaborative phased elimination algorithm that achieves $\tilde{O}\left(\alpha+ 1/\sqrt{M}\right) \sqrt{dT}...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We study a two-player stochastic multi-armed bandit (MAB) problem with different expected rewards fo...
When humans collaborate with each other, they often make decisions by observing others and consideri...
International audienceThis paper introduces a general multi-agent bandit model in which each agent i...
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...
We consider a stochastic linear bandit problem in which the rewards are not only subject to random n...
We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown v...
We consider a collaborative online learning paradigm, wherein a group of agents connected through a ...
We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion...
We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a n...
International audienceWe provide new lower bounds on the regret that must be suffered by adversarial...
We study stochastic linear payoff bandit prob-lems and give a simple, computationally ef-ficient alg...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the...
Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We study a two-player stochastic multi-armed bandit (MAB) problem with different expected rewards fo...
When humans collaborate with each other, they often make decisions by observing others and consideri...
International audienceThis paper introduces a general multi-agent bandit model in which each agent i...
We study decentralized stochastic linear bandits, where a network of N agents acts cooperatively to ...
We consider a stochastic linear bandit problem in which the rewards are not only subject to random n...
We study a multi-agent stochastic linear bandit with side information, parameterized by an unknown v...
We consider a collaborative online learning paradigm, wherein a group of agents connected through a ...
We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion...
We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a n...
International audienceWe provide new lower bounds on the regret that must be suffered by adversarial...
We study stochastic linear payoff bandit prob-lems and give a simple, computationally ef-ficient alg...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
This paper investigates stochastic and adversarial combinatorial multi-armed bandit problems. In the...
Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We study a two-player stochastic multi-armed bandit (MAB) problem with different expected rewards fo...
When humans collaborate with each other, they often make decisions by observing others and consideri...