We consider the problem of learning by demonstration from agents acting in un- known stochastic Markov environments or games. Our aim is to estimate agent prefer- ences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous prob- abilistic approaches for inverse reinforcement learning in known MDPs to the case of un- known dynamics or opponents. We do this by deriving two simplified probabilistic mod- els of the demonstrator's policy and utility. For tractability, we use maximum a posteri- ori estimation rather than full Bayesian in- ference. Under a at prior, this results in a convex optimisation problem. We nd that the resulting algorithms are highly compet- itive again...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
In the field of reinforcement learning there has been recent progress towards safety and high-confid...
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirem...
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov...
Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation,...
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation,...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
Various methods for solving the inverse reinforcement learning (IRL) problem have been developed ind...
In decision-making problems reward function plays an important role in finding the best policy. Rein...
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and t...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
In the field of reinforcement learning there has been recent progress towards safety and high-confid...
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirem...
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov...
Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation,...
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
Abstract. We state the problem of inverse reinforcement learning in terms of preference elicitation,...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
Various methods for solving the inverse reinforcement learning (IRL) problem have been developed ind...
In decision-making problems reward function plays an important role in finding the best policy. Rein...
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and t...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
In the field of reinforcement learning there has been recent progress towards safety and high-confid...
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirem...