In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when task parameters are drawn from a mixture distribution instead. For this problem, we propose a regularized version of the OFUL algorithm that, when trained on tasks with labeled environments, achieves low regret on a new task without requiring knowledge of the environment from which the new task originates. Specifically, our regret bound for the new al...
Abstract We study an idealised sequential resource allocation problem. In each time step the learner...
In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks s...
Many real-world domains are subject to a structured non-stationarity which affects the agent's goals...
We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal ...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic ba...
Motivated by recent developments on meta-learning with linear contextual bandit tasks, we study the ...
Meta-reinforcement learning has the potential to enable artificial agents to master new skills with ...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We study online learning with bandit feedback across multiple tasks, with the goal of improving aver...
We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks s...
Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt...
A stochastic combinatorial semi-bandit is an on-line learning problem where at each step a learn-ing...
International audienceFast adaptation to changes in the environment requires agents (animals, robots...
In contextual bandits, an algorithm must choose actions given ob- served contexts, learning from a r...
Abstract We study an idealised sequential resource allocation problem. In each time step the learner...
In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks s...
Many real-world domains are subject to a structured non-stationarity which affects the agent's goals...
We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal ...
We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a...
We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic ba...
Motivated by recent developments on meta-learning with linear contextual bandit tasks, we study the ...
Meta-reinforcement learning has the potential to enable artificial agents to master new skills with ...
We consider the problem of online learning in misspecified linear stochastic multi-armed bandit prob...
We study online learning with bandit feedback across multiple tasks, with the goal of improving aver...
We study the problem of learning-to-learn: inferring a learning algorithm that works well on tasks s...
Meta-reinforcement learning enables artificial agents to learn from related training tasks and adapt...
A stochastic combinatorial semi-bandit is an on-line learning problem where at each step a learn-ing...
International audienceFast adaptation to changes in the environment requires agents (animals, robots...
In contextual bandits, an algorithm must choose actions given ob- served contexts, learning from a r...
Abstract We study an idealised sequential resource allocation problem. In each time step the learner...
In learning-to-learn the goal is to infer a learning algorithm that works well on a class of tasks s...
Many real-world domains are subject to a structured non-stationarity which affects the agent's goals...