We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online setup, in which a player (learner) encounters a sequence of multi-armed bandit episodes. The player's performance is measured as regret against the best arm in each episode, according to the losses generated by an adversary. The difficulty of the problem depends on the empirical distribution of the per-episode best arm chosen by the adversary. We present an algorithm that can leverage the non-uniformity in this empirical distribution, and derive problem-dependent regret bounds. This solution comprises an inner learner that plays each episode separately, and an outer learner that updates the hyper-parameters of the inner algorithm between the epi...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We ...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We study online learning with bandit feedback across multiple tasks, with the goal of improving aver...
In this paper, we investigate the impact of diverse user preference on learning under the stochastic...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
We study online learning in adversarial communicating Markov Decision Processes with full informatio...
We study a new class of online learning problems where each of the online algorithm’s actions is ass...
We study two problems of online learning under restricted information access. In the first problem, ...
Online learning algorithms are designed to learn even when their input is generated by an adversary....
International audienceWe study a multiplayer stochastic multi-armed bandit problem in which players ...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider running multiple instances of multi-armed bandit (MAB) problems in parallel. A main moti...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We ...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We study online learning with bandit feedback across multiple tasks, with the goal of improving aver...
In this paper, we investigate the impact of diverse user preference on learning under the stochastic...
This paper considers the multi-armed bandit (MAB) problem and provides a new best-of-both-worlds (BO...
We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at ea...
We study online learning in adversarial communicating Markov Decision Processes with full informatio...
We study a new class of online learning problems where each of the online algorithm’s actions is ass...
We study two problems of online learning under restricted information access. In the first problem, ...
Online learning algorithms are designed to learn even when their input is generated by an adversary....
International audienceWe study a multiplayer stochastic multi-armed bandit problem in which players ...
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlate...
We consider running multiple instances of multi-armed bandit (MAB) problems in parallel. A main moti...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We ...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...