Imitation Learning from observation describes policy learning in a similar way to human learning. An agent’s policy is trained by observing an expert performing a task. Although many state-only imitation learning approaches are based on adversarial imitation learning, one main drawback is that adversarial training is often unstable and lacks a reliable convergence estimator. If the true environment reward is unknown and cannot be used to select the best-performing model, this can result in bad real-world policy performance. We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric. Our training objective minimizes the Kulback-Leibler divergence (KLD) between the polic...
We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only...
Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learni...
Modelling a reward able to convey the right incentive to the agent is fairly tedious in terms of eng...
We consider the problem of imitation learning from a finite set of expert trajectories, without acce...
Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) meth...
Imitation learning, sometimes referred as learning from demonstrations, has been used in real world ...
Offline imitation from observations aims to solve MDPs where only task-specific expert states and ta...
Adversarial imitation learning has become a widely used imitation learning framework. The discrimina...
Advances in robotics have resulted in increases both in the availability of robots and also their co...
RJCIA 2022National audienceDeep Reinforcement Learning methods require a large amount of data to ach...
Many existing imitation learning datasets are collected from multiple demonstrators, each with diffe...
Reinforcement learning (RL) provides a powerful framework for decision-making, but its application i...
Given a dataset of expert agent interactions with an environment of interest, a viable method to ext...
In this work we formulate and treat an extension of the Imitation from Observations problem. Imitati...
The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the d...
We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only...
Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learni...
Modelling a reward able to convey the right incentive to the agent is fairly tedious in terms of eng...
We consider the problem of imitation learning from a finite set of expert trajectories, without acce...
Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) meth...
Imitation learning, sometimes referred as learning from demonstrations, has been used in real world ...
Offline imitation from observations aims to solve MDPs where only task-specific expert states and ta...
Adversarial imitation learning has become a widely used imitation learning framework. The discrimina...
Advances in robotics have resulted in increases both in the availability of robots and also their co...
RJCIA 2022National audienceDeep Reinforcement Learning methods require a large amount of data to ach...
Many existing imitation learning datasets are collected from multiple demonstrators, each with diffe...
Reinforcement learning (RL) provides a powerful framework for decision-making, but its application i...
Given a dataset of expert agent interactions with an environment of interest, a viable method to ext...
In this work we formulate and treat an extension of the Imitation from Observations problem. Imitati...
The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the d...
We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only...
Adversarial imitation learning (AIL) has become a popular alternative to supervised imitation learni...
Modelling a reward able to convey the right incentive to the agent is fairly tedious in terms of eng...