We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d. from an unknown prior distribution, and learns its meta-parameters to perform better on future tasks. We propose the first Bayesian and frequentist meta-learning algorithms for this setting. The Bayesian algorithm has access to a prior distribution over the meta-parameters and its meta simple regret over m bandit tasks with horizon n is mere O(m / √n). On the other hand, the meta simple regret of the frequentist algorithm is O(n√m + m/ √n). While its regret is worse, the frequentist algorithm is more general because it does not need a prior distribution over t...
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to l...
We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online set...
International audienceFast adaptation to changes in the environment requires agents (animals, robots...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal ...
We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic ba...
We study online learning with bandit feedback across multiple tasks, with the goal of improving aver...
In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear sto...
Classical regret minimization in a bandit frame-work involves a number of probability distributions ...
In contextual bandits, an algorithm must choose actions given ob- served contexts, learning from a r...
Motivated by recent developments on meta-learning with linear contextual bandit tasks, we study the ...
International audienceMeta-learning tackles various means of learning from past tasks to perform new...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to l...
We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online set...
International audienceFast adaptation to changes in the environment requires agents (animals, robots...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal ...
We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic ba...
We study online learning with bandit feedback across multiple tasks, with the goal of improving aver...
In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear sto...
Classical regret minimization in a bandit frame-work involves a number of probability distributions ...
In contextual bandits, an algorithm must choose actions given ob- served contexts, learning from a r...
Motivated by recent developments on meta-learning with linear contextual bandit tasks, we study the ...
International audienceMeta-learning tackles various means of learning from past tasks to perform new...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
We study learning algorithms for the classical Markovian bandit problem with discount. We explain ho...
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to l...
We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online set...
International audienceFast adaptation to changes in the environment requires agents (animals, robots...