We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on an observed context under mild nonparametric assumptions.We attain tight results for top-arm identification and a sublinear regret of Õ(T1+D/(2+D), where D is the context dimension, for a modified UCB algorithm that is simple to implement. We then give global intrinsic dimension dependent and ambient dimension independent regret bounds. We also discuss recovering topological structures within the context space based on expected bandit performance and provide an extension to infinite-armed contextual bandits. Finally, we experimentally show the improvement of our algorithm over existing approaches for both simulated tasks and MNIST image class...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
We consider a bandit problem which involves sequential sampling from two populations (arms). Each ar...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits...
The bandit problem models a sequential decision process between a player and an environment. In the ...
Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news a...
We consider a linear stochastic bandit prob-lem where the dimension K of the unknown parameter is l...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
In contextual continuum-armed bandits, the contexts $x$ and the arms $y$ are both continuous and dra...
The growing interest in complex decision-making and language modeling problems highlights the import...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail propert...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
We consider a bandit problem which involves sequential sampling from two populations (arms). Each ar...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits...
The bandit problem models a sequential decision process between a player and an environment. In the ...
Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news a...
We consider a linear stochastic bandit prob-lem where the dimension K of the unknown parameter is l...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
We propose a novel algorithm for generalized linear contextual bandits (GLBs) with a regret bound su...
Contextual multi-armed bandit (MAB) algorithms have been shown promising for maximizing cumulative r...
In contextual continuum-armed bandits, the contexts $x$ and the arms $y$ are both continuous and dra...
The growing interest in complex decision-making and language modeling problems highlights the import...
Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several o...
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail propert...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
We consider a bandit problem which involves sequential sampling from two populations (arms). Each ar...
International audienceIn the classical multi-armed bandit problem, d arms are available to the decis...