In contextual continuum-armed bandits, the contexts $x$ and the arms $y$ are both continuous and drawn from high-dimensional spaces. The payoff function to learn $f(x,y)$ does not have a particular parametric form. The literature has shown that for Lipschitz-continuous functions, the optimal regret is $\tilde{O}(T^{\frac{d_x+d_y+1}{d_x+d_y+2}})$, where $d_x$ and $d_y$ are the dimensions of contexts and arms, and thus suffers from the curse of dimensionality. We develop an algorithm that achieves regret $\tilde{O}(T^{\frac{d_x+1}{d_x+2}})$ when $f$ is globally concave in $y$. The global concavity is a common assumption in many applications. The algorithm is based on stochastic approximation and estimates the gradient information in an online...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
International audienceWe consider a generalization of stochastic bandits where the set of arms, $\cX...
International audienceWe consider a generalization of stochastic bandit problems where the set of ar...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of ...
We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on ...
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to l...
International audienceWe consider the setting of stochastic bandit problems with a continuum of arms...
We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariat...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where t...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction ...
International audienceWe consider stochastic multi-armed bandit problems where the expected reward i...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...
International audienceWe consider a generalization of stochastic bandits where the set of arms, $\cX...
International audienceWe consider a generalization of stochastic bandit problems where the set of ar...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
We consider a situation where an agent has $T$ ressources to be allocated to a larger number $N$ of ...
We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on ...
In many applications, e.g. in healthcare and e-commerce, the goal of a contextual bandit may be to l...
International audienceWe consider the setting of stochastic bandit problems with a continuum of arms...
We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariat...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitz function...
We consider an adversarial variant of the classic $K$-armed linear contextual bandit problem where t...
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction ...
International audienceWe consider stochastic multi-armed bandit problems where the expected reward i...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits...
We consider bandit problems involving a large (possibly infinite) collection of arms, in which the e...