Globally informative thompson sampling for structured bandit problems with application to crowdtranscoding

Xingchi Liu (3591209)
Mahsa Derakhshani (2572993)
Ziming Zhu (7203380)
Sangarapillai Lambotharan (1252278)

Publication date

April 2021

Abstract

Multi-armed bandit is a widely-studied model for sequential decision-making problems. The most studied model in the literature is stochastic bandits wherein the reward of each arm follows an independent distribution. However, there is a wide range of applications where the rewards of different alternatives are correlated to some extent. In this paper, a class of structured bandit problems is studied in which rewards of different arms are functions of the same unknown parameter vector. To minimize the cumulative learning regret, we propose a globally informative Thompson sampling algorithm to learn and leverage the correlation among arms, which can deal with unknown multidimensional parameter and non-monotonic reward functions. Our studies d...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Globally informative thompson sampling for structured bandit problems with application to crowdtranscoding

Abstract

Extracted data

Globally informative thompson sampling for structured bandit problems with application to crowdtranscoding

Abstract

Extracted data

Related items

Related items