We consider the problem of imitation learning from a finite set of expert trajectories, without access to reinforcement signals. The classical approach of extracting the expert's reward function via inverse reinforcement learning, followed by reinforcement learning is indirect and may be computationally expensive. Recent generative adversarial methods based on matching the policy distribution between the expert and the agent could be unstable during training. We propose a new framework for imitation learning by estimating the support of the expert policy to compute a fixed reward function, which allows us to re-frame imitation learning within the standard reinforcement learning setting. We demonstrate the efficacy of our reward function on ...
Despite the recent success of reinforcement learning in various domains, these approaches remain, fo...
The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the d...
Sequential decisions and predictions are common problems in natural language processing, robotics, a...
Inverse Reinforcement Learning (IRL) learns an optimal policy, given some expert demonstrations, thu...
As robots and other autonomous agents enter our homes, hospitals, schools, and workplaces, it is imp...
Imitation Learning from observation describes policy learning in a similar way to human learning. An...
Many existing imitation learning datasets are collected from multiple demonstrators, each with diffe...
Reinforcement learning (RL) provides a powerful framework for decision-making, but its application i...
A common problem in Reinforcement Learning (RL) is that the reward function is hard to express. This...
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactiv...
We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only...
International audience—Learning from Demonstrations (LfD) is a paradigm by which an apprentice agent...
International audienceWhen cast into the Deep Reinforcement Learning framework, many robotics tasks ...
We study how to effectively leverage expert feedback to learn sequential decision-making policies. W...
Imitation learning refers to the problem where an agent learns a policy that mimics the demonstratio...
Despite the recent success of reinforcement learning in various domains, these approaches remain, fo...
The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the d...
Sequential decisions and predictions are common problems in natural language processing, robotics, a...
Inverse Reinforcement Learning (IRL) learns an optimal policy, given some expert demonstrations, thu...
As robots and other autonomous agents enter our homes, hospitals, schools, and workplaces, it is imp...
Imitation Learning from observation describes policy learning in a similar way to human learning. An...
Many existing imitation learning datasets are collected from multiple demonstrators, each with diffe...
Reinforcement learning (RL) provides a powerful framework for decision-making, but its application i...
A common problem in Reinforcement Learning (RL) is that the reward function is hard to express. This...
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactiv...
We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only...
International audience—Learning from Demonstrations (LfD) is a paradigm by which an apprentice agent...
International audienceWhen cast into the Deep Reinforcement Learning framework, many robotics tasks ...
We study how to effectively leverage expert feedback to learn sequential decision-making policies. W...
Imitation learning refers to the problem where an agent learns a policy that mimics the demonstratio...
Despite the recent success of reinforcement learning in various domains, these approaches remain, fo...
The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the d...
Sequential decisions and predictions are common problems in natural language processing, robotics, a...