Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact convex action sets $\mathcal{K}\subset\mathbb{R}^n$ and two types of structural assumptions lead to better pseudo-regret bounds. When $\mathcal{K}$ is the simplex or an $\ell_p$ ball with $p\in]1,2]$, there exist bandits algorithms with $\tilde{\mathcal{O}}(\sqrt{nT})$ pseudo-regret bounds. Here, we derive bandit algorithms for some strongly convex sets beyond $\ell_p$ balls that enjoy pseudo-regret bounds of $\tilde{\mathcal{O}}(\sqrt{nT})$, which answers an open question from [BCB12, \S 5.5.]. Interestingly, when the action set is uniformly convex but not necessarily strongly convex, we obtain pseudo-regret bounds with a dimension dependency sm...
Abstract: Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in t...
We consider a linear stochastic bandit prob-lem where the dimension K of the unknown parameter is l...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact conv...
Consider the online convex optimization problem, in which a player has to choose ac-tions iterativel...
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is ...
We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion...
The problem of stochastic convex optimization with bandit feedback (in the learning com-munity) or w...
The study of online convex optimization in the bandit setting was initiated by Klein-berg (2004) and...
International audienceWe consider stochastic multi-armed bandit problems where the expected reward i...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
Numerous machine learning problems require an exploration basis- a mechanism to explore the action s...
The safe linear bandit problem is a version of the classic linear bandit problem where the learner's...
Linear bandits have a wide variety of applications including recommendation systems yet they make on...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction ...
Abstract: Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in t...
We consider a linear stochastic bandit prob-lem where the dimension K of the unknown parameter is l...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
Linear bandit algorithms yield $\tilde{\mathcal{O}}(n\sqrt{T})$ pseudo-regret bounds on compact conv...
Consider the online convex optimization problem, in which a player has to choose ac-tions iterativel...
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is ...
We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion...
The problem of stochastic convex optimization with bandit feedback (in the learning com-munity) or w...
The study of online convex optimization in the bandit setting was initiated by Klein-berg (2004) and...
International audienceWe consider stochastic multi-armed bandit problems where the expected reward i...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
Numerous machine learning problems require an exploration basis- a mechanism to explore the action s...
The safe linear bandit problem is a version of the classic linear bandit problem where the learner's...
Linear bandits have a wide variety of applications including recommendation systems yet they make on...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction ...
Abstract: Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in t...
We consider a linear stochastic bandit prob-lem where the dimension K of the unknown parameter is l...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...