We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety ...
In this paper, we describe how techniques from reinforcement learning might be used to approach the ...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...
Inverse reinforcement learning (IRL) addresses the problem of re-covering the unknown reward functio...
We consider the problem of learning by demonstration from agents acting in un- known stochastic Mark...
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and t...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation,...
In decision-making problems reward function plays an important role in finding the best policy. Rein...
Various methods for solving the inverse reinforcement learning (IRL) problem have been developed ind...
In the field of reinforcement learning there has been recent progress towards safety and high-confid...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
Learning to play in the presence of independent and self-motivated opponents is a difficult task, be...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
In this paper, we describe how techniques from reinforcement learning might be used to approach the ...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...
Inverse reinforcement learning (IRL) addresses the problem of re-covering the unknown reward functio...
We consider the problem of learning by demonstration from agents acting in un- known stochastic Mark...
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and t...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation,...
In decision-making problems reward function plays an important role in finding the best policy. Rein...
Various methods for solving the inverse reinforcement learning (IRL) problem have been developed ind...
In the field of reinforcement learning there has been recent progress towards safety and high-confid...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
Learning to play in the presence of independent and self-motivated opponents is a difficult task, be...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
In this paper, we describe how techniques from reinforcement learning might be used to approach the ...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...
Inverse reinforcement learning (IRL) addresses the problem of re-covering the unknown reward functio...