In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the alpha-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navig...
Many reinforcement learning algorithms use trajectories collected from the execution of one or more ...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...
As robots and other autonomous agents enter our homes, hospitals, schools, and workplaces, it is imp...
Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption t...
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov...
Inverse Reinforcement Learning (IRL) aims to recover a reward function from expert demonstrations in...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
Abstract. Reinforcement learning means finding the optimal course of action in Markovian environment...
Inverse reinforcement learning (IRL) aims to recover the reward function underlying a Markov Decisio...
Inverse reinforcement learning (1RL) aims to recover the reward function underlying a Markov Decisio...
Various methods for solving the inverse reinforcement learning (IRL) problem have been developed ind...
Abstract. Inverse reinforcement learning (IRL) addresses the problem of recovering a task descriptio...
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting...
Many reinforcement learning algorithms use trajectories collected from the execution of one or more ...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...
As robots and other autonomous agents enter our homes, hospitals, schools, and workplaces, it is imp...
Learning near-optimal behaviour from an expert's demonstrations typically relies on the assumption t...
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov...
Inverse Reinforcement Learning (IRL) aims to recover a reward function from expert demonstrations in...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
Abstract. Reinforcement learning means finding the optimal course of action in Markovian environment...
Inverse reinforcement learning (IRL) aims to recover the reward function underlying a Markov Decisio...
Inverse reinforcement learning (1RL) aims to recover the reward function underlying a Markov Decisio...
Various methods for solving the inverse reinforcement learning (IRL) problem have been developed ind...
Abstract. Inverse reinforcement learning (IRL) addresses the problem of recovering a task descriptio...
We state the problem of inverse reinforcement learning in terms of preference elicitation, resulting...
Many reinforcement learning algorithms use trajectories collected from the execution of one or more ...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
Learning desirable behavior from a limited number of demonstrations, also known as inverse reinforce...