We study online learning with bandit feedback across multiple tasks, with the goal of improving average performance across tasks if they are similar according to some natural task-similarity measure. As the first to target the adversarial setting, we design a unified meta-algorithm that yields setting-specific guarantees for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-algorithm tunes the initialization, step-size, and entropy parameter of the Tsallis-entropy generalization of the well-known Exp3 method, with the task-averaged regret provably improving if the entropy of the distribution over estimated optima-in-hindsight is small. For BLO, we learn the initialization, step-size, and ...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online set...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
Much of modern learning theory has been split between two regimes: the classical offline setting, wh...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
Online learning algorithms are designed to learn even when their input is generated by an adversary....
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
We study the adversarial bandit problem under $S$ number of switching best arms for unknown $S$. For...
We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion...
This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedb...
We study a new class of online learning problems where each of the online algorithm’s actions is ass...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online set...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
Much of modern learning theory has been split between two regimes: the classical offline setting, wh...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
Online learning algorithms are designed to learn even when their input is generated by an adversary....
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
We study the adversarial bandit problem under $S$ number of switching best arms for unknown $S$. For...
We provide a new analysis framework for the adversarial multi-armed bandit problem. Using the notion...
This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedb...
We study a new class of online learning problems where each of the online algorithm’s actions is ass...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...