In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted reward distributions or arms with time-invariant corruption distributions. At each iteration, the player chooses an arm. Given the arm, the environment returns an uncorrupted reward with probability 1−ε and an arbitrarily corrupted reward with probability ε. In our setting, the uncorrupted reward might be heavy-tailed and the corrupted reward might be unbounded. We prove a lower bound on the regret indicating that the corrupted and heavy-tailed bandits are strictly harder than uncorrupted or light-tailed bandits. We observe that the environments can be categorised into hardness regimes depending on the suboptimality gap ∆, variance σ, and corrup...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
We design new policies that ensure both worst-case optimality for expected regret and light-tailed r...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
50 pages; 4 figuresWe investigate the regret-minimisation problem in a multi-armed bandit setting wi...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
Classical regret minimization in a bandit frame-work involves a number of probability distributions ...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
International audienceAlgorithms based on upper confidence bounds for balancing exploration and expl...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
Much of the literature on optimal design of bandit algorithms is based on minimization of expected r...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
We design new policies that ensure both worst-case optimality for expected regret and light-tailed r...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
50 pages; 4 figuresWe investigate the regret-minimisation problem in a multi-armed bandit setting wi...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose p-...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
Classical regret minimization in a bandit frame-work involves a number of probability distributions ...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
International audienceAlgorithms based on upper confidence bounds for balancing exploration and expl...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
Much of the literature on optimal design of bandit algorithms is based on minimization of expected r...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
We design new policies that ensure both worst-case optimality for expected regret and light-tailed r...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...