Robust Contextual Bandits via Bootstrapping

Tang, Qiao
Xie, Hong
Xia, Yunni
Lee, Jia
Zhu, Qingsheng

Open link

Publication date

May 2021

DOI

10.1609/aaai.v35i13.17446

Publisher

Association for the Advancement of Artificial Intelligence

Abstract

Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrapping technique. We first establish sufficient conditions under which our estimator converges asymptotically to the ground truth of contextual bandit UCB. We further derive a ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Robust Contextual Bandits via Bootstrapping

Abstract

Extracted data

Robust Contextual Bandits via Bootstrapping

Abstract

Extracted data

Related items

Related items