Meta-Learning Adversarial Bandits

Balcan, Maria-Florina
Harris, Keegan
Khodak, Mikhail
Wu, Zhiwei Steven

Publication date

May 2022

Abstract

We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial setting, we design a unified meta-algorithm that yields setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-algorithm tunes the initialization, step-size, and entropy parameter of the Tsallis-entropy generalization of the well-known Exp3 method, with the task-averaged regret provably improving if the entropy of the distribution over estimated optima-in-hindsight is small. For BLO, we learn the initialization, step-size, and ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Meta-Learning Adversarial Bandits

Abstract

Extracted data

Meta-Learning Adversarial Bandits

Abstract

Extracted data

Related items

Related items