Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost distributions: the classic $K$-armed bandit and the linearly parameterized bandit. In both settings, we propose algorithms that are inspired by Upper Confidence Bound (UCB), incorporate cost distortions, and exhibit sublinear regret assuming \holder continuous weight distortion functions. For the $K$-armed setting, we show that the algorithm, called W-UCB, achieves problem-dependent regret $O(L^2 M^2 \log n/ \Delta^{\frac{2}{\alpha}-1})$, where $n$ is the number of plays, $\Delta$ is the gap in distorted exp...
International audienceThis work addresses the problem of regret minimization in non-stochastic multi...
Linear bandits have a wide variety of applications including recommendation systems yet they make on...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
Much of the literature on optimal design of bandit algorithms is based on minimization of expected r...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining p...
AbstractAlgorithms based on upper confidence bounds for balancing exploration and exploitation are g...
50 pages; 4 figuresWe investigate the regret-minimisation problem in a multi-armed bandit setting wi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
Despite the significant interest and progress in reinforcement learning (RL) problems with adversari...
International audienceThis work addresses the problem of regret minimization in non-stochastic multi...
Linear bandits have a wide variety of applications including recommendation systems yet they make on...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...
Motivated by models of human decision making proposed to explain commonly observed deviations from c...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
International audienceAlgorithms based on upper-confidence bounds for balancing exploration and expl...
Much of the literature on optimal design of bandit algorithms is based on minimization of expected r...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining p...
AbstractAlgorithms based on upper confidence bounds for balancing exploration and exploitation are g...
50 pages; 4 figuresWe investigate the regret-minimisation problem in a multi-armed bandit setting wi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We improve the theoretical analysis and empirical performance of algorithms for the stochastic multi...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
Despite the significant interest and progress in reinforcement learning (RL) problems with adversari...
International audienceThis work addresses the problem of regret minimization in non-stochastic multi...
Linear bandits have a wide variety of applications including recommendation systems yet they make on...
This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit...