We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion of convex smoothing, we define a novel family of algorithms with minimax optimal regret guarantees. First, we show that regular-ization via the Tsallis entropy, which includes EXP3 as a special case, matches the O( NT) minimax regret with a smaller constant factor. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as O( NT logN), as long as the perturbation distribution has a bounded haz-ard function. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property and lead to near-optimal algorithms.
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
We consider a stochastic linear bandit problem in which the rewards are not only subject to random n...
We consider stochastic bandit problems with a continuum set of arms and where the expected re-ward i...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
International audienceWe provide new lower bounds on the regret that must be suffered by adversarial...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
International audienceWe consider stochastic multi-armed bandit problems where the expected reward i...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction ...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...
In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...
We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear...
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stoc...
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environm...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
We consider a stochastic linear bandit problem in which the rewards are not only subject to random n...
We consider stochastic bandit problems with a continuum set of arms and where the expected re-ward i...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
International audienceWe provide new lower bounds on the regret that must be suffered by adversarial...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
International audienceWe consider stochastic multi-armed bandit problems where the expected reward i...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction ...
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a centr...
In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-...
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and ob...
We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear...
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stoc...
In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environm...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
We consider a stochastic linear bandit problem in which the rewards are not only subject to random n...
We consider stochastic bandit problems with a continuum set of arms and where the expected re-ward i...