Despite success in many challenging problems, reinforcement learning (RL) is still confronted with sample inefficiency, which can be mitigated by introducing prior knowledge to agents. However, many transfer techniques in reinforcement learning make the limiting assumption that the teacher is an expert. In this paper, we use the action prior from the Reinforcement Learning as Inference framework - that is, a distribution over actions at each state which resembles a teacher policy, rather than a Bayesian prior - to recover state-of-the-art policy distillation techniques. Then, we propose a class of adaptive methods that can robustly exploit action priors by combining reward shaping and auxiliary regularization losses. In contrast to prior wo...
In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pi...
Reinforcement learning is a powerful approach for learning control policies that solve sequential de...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
Despite success in many challenging problems, reinforcement learning (RL) is still confronted with s...
Abstract—The computational complexity of learning in sequen-tial decision problems grows exponential...
When applying reinforcement learning to real world problems it is desir-able to make use of any prio...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
We consider reinforcement learning in partially observable domains where the agent can query an expe...
textabstractMany traditional reinforcement-learning algorithms have been designed for problems with ...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
A long-standing challenge in reinforcement learning is the design of function approximations and eff...
Thesis (Ph.D.), Computer Science, Washington State UniversityReinforcement learning (RL) has had man...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
Deep Reinforcement Learning enables us to control increasingly complex and high-dimensional problems...
Abstract. In some reinforcement learning problems an agent may be provided with a set of input polic...
In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pi...
Reinforcement learning is a powerful approach for learning control policies that solve sequential de...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
Despite success in many challenging problems, reinforcement learning (RL) is still confronted with s...
Abstract—The computational complexity of learning in sequen-tial decision problems grows exponential...
When applying reinforcement learning to real world problems it is desir-able to make use of any prio...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
We consider reinforcement learning in partially observable domains where the agent can query an expe...
textabstractMany traditional reinforcement-learning algorithms have been designed for problems with ...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
A long-standing challenge in reinforcement learning is the design of function approximations and eff...
Thesis (Ph.D.), Computer Science, Washington State UniversityReinforcement learning (RL) has had man...
Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understoo...
Deep Reinforcement Learning enables us to control increasingly complex and high-dimensional problems...
Abstract. In some reinforcement learning problems an agent may be provided with a set of input polic...
In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pi...
Reinforcement learning is a powerful approach for learning control policies that solve sequential de...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...