Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrapping technique. We first establish sufficient conditions under which our estimator converges asymptotically to the ground truth of contextual bandit UCB. We further derive a ...
In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit ...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
Despite the significant interest and progress in reinforcement learning (RL) problems with adversari...
Abstract—The contextual bandit problem is typically used to model online applications such as articl...
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as ...
The bandit problem models a sequential decision process between a player and an environment. In the ...
Contextual combinatorial cascading bandit ( $C^{3}$ -bandit) is a powerful multi-armed bandit framew...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on ...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
Recent works on neural contextual bandit have achieved compelling performances thanks to their abili...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit ...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
Despite the significant interest and progress in reinforcement learning (RL) problems with adversari...
Abstract—The contextual bandit problem is typically used to model online applications such as articl...
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as ...
The bandit problem models a sequential decision process between a player and an environment. In the ...
Contextual combinatorial cascading bandit ( $C^{3}$ -bandit) is a powerful multi-armed bandit framew...
Contextual bandits with linear payoffs, which are also known as linear bandits, provide a powerful a...
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known bef...
We analyze the K-armed bandit problem where the reward for each arm is a noisy realization based on ...
We introduce a stochastic contextual bandit model where at each time step the environment chooses a ...
Graduate School of Artificial Intelligence ArtificiMulti-armed bandit is a well-formulated test bed ...
Recent works on neural contextual bandit have achieved compelling performances thanks to their abili...
International audienceContextual bandit algorithms are essential for solving many real-world interac...
We present a new algorithm for the contextual bandit learning problem, where the learner repeat-edly...
In the contextual linear bandit setting, algorithms built on the optimism principle fail to exploit ...
In this paper, we study the stochastic bandits problem with k unknown heavy-tailed and corrupted rew...
Despite the significant interest and progress in reinforcement learning (RL) problems with adversari...