We consider learning in a Markov decision process where we are not explicitly given a reward function, but where instead we can observe an expert demonstrating the task that we want to learn to perform. This setting is useful in applications (such as the task of driving) where it may be difficult to write down an explicit reward function specifying exactly how different desiderata should be traded off. We think of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and give an algorithm for learning the task demonstrated by the expert. Our algorithm is based on using “inverse reinforcement learning ” to try to recover the unknown reward function. We show that our algorithm termin...
As the field of robotic and humanoid systems expand, more research is being done on how to best cont...
A major challenge faced by machine learning community is the decision making problems under uncertai...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...
In traditional Reinforcement Learning (RL) [4], a single agent learns to act in an environment by op...
One of the fundamental problems of artificial intelligence is learning how to behave optimally. With...
International audienceThis paper addresses the problem of apprenticeship learning, that is learning ...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
International audienceThis paper addresses the problem of apprenticeship learning, that is learning ...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
In this paper we study the question of life long learning of behaviors from human demonstrations by ...
Abstract. Inverse reinforcement learning (IRL) addresses the problem of recovering a task descriptio...
In decision-making problems reward function plays an important role in finding the best policy. Rein...
Abstract. This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and A...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
As the field of robotic and humanoid systems expand, more research is being done on how to best cont...
A major challenge faced by machine learning community is the decision making problems under uncertai...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...
In traditional Reinforcement Learning (RL) [4], a single agent learns to act in an environment by op...
One of the fundamental problems of artificial intelligence is learning how to behave optimally. With...
International audienceThis paper addresses the problem of apprenticeship learning, that is learning ...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
International audienceThis paper addresses the problem of apprenticeship learning, that is learning ...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
In this paper we study the question of life long learning of behaviors from human demonstrations by ...
Abstract. Inverse reinforcement learning (IRL) addresses the problem of recovering a task descriptio...
In decision-making problems reward function plays an important role in finding the best policy. Rein...
Abstract. This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and A...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
As the field of robotic and humanoid systems expand, more research is being done on how to best cont...
A major challenge faced by machine learning community is the decision making problems under uncertai...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...