We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Markovian rewards. Existing work assumes that human evaluators observe each step in a trajectory independently when providing feedback on agent behaviour. In this work, we remove this assumption, extending RM to capture temporal dependencies in human assessment of trajectories. We show how RM can be approached as a multiple instance learning (MIL) problem, where trajectories are treated as bags with return labels, and steps within the trajectories are instances with unseen reward labels. We go on to develop new MIL models that are able to capture the time dependencies in labelled trajectories. We demonstrate on a range of RL tasks that our novel...
Reinforcement learning involves the study of how to solve sequential decision-making problems using ...
<div><p>Humans can learn under a wide variety of feedback conditions. Reinforcement learning (RL), w...
We study a class of reinforcement learning tasks in which the agent receives its reward for complex,...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
The standard feedback model of reinforcement learning requires revealing the reward of every visited...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
Existing inverse reinforcement learning (IRL) algorithms have assumed each ex-pert’s demonstrated tr...
Many challenging partially observable reinforcement learning problems have sparse rewards and most e...
International audienceMarkovian systems are widely used in reinforcement learning (RL), when the suc...
Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated tra...
Unlike the standard Reinforcement Learning (RL) model, many real-world tasks are non-Markovian, whos...
Learning to act optimally in the complex world has long been a major goal in artificial intelligence...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
Reinforcement learning involves the study of how to solve sequential decision-making problems using ...
<div><p>Humans can learn under a wide variety of feedback conditions. Reinforcement learning (RL), w...
We study a class of reinforcement learning tasks in which the agent receives its reward for complex,...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
We generalise the problem of reward modelling (RM) for reinforcement learning (RL) to handle non-Mar...
The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is t...
The standard feedback model of reinforcement learning requires revealing the reward of every visited...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
Existing inverse reinforcement learning (IRL) algorithms have assumed each ex-pert’s demonstrated tr...
Many challenging partially observable reinforcement learning problems have sparse rewards and most e...
International audienceMarkovian systems are widely used in reinforcement learning (RL), when the suc...
Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated tra...
Unlike the standard Reinforcement Learning (RL) model, many real-world tasks are non-Markovian, whos...
Learning to act optimally in the complex world has long been a major goal in artificial intelligence...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
Reinforcement learning involves the study of how to solve sequential decision-making problems using ...
<div><p>Humans can learn under a wide variety of feedback conditions. Reinforcement learning (RL), w...
We study a class of reinforcement learning tasks in which the agent receives its reward for complex,...