How can we effectively exploit the collected samples when solving a continuous control task with Reinforcement Learning? Recent results have empirically demonstrated that multiple policy optimization steps can be performed with the same batch by using off-distribution techniques based on importance sampling. However, when dealing with off-distribution optimization, it is essential to take into account the uncertainty introduced by the importance sampling process. In this paper, we propose and analyze a class of model-free, policy search algorithms that extend the recent Policy Optimization via Importance Sampling (Metelli et al., 2018) by incorporating two advanced variance reduction techniques: per-decision and multiple importance sampling...
Stochastic processes are an important theoretical tool to model sequential phenomenon in the natural...
In importance sampling (IS)-based reinforcement learning algorithms such as Proximal Policy Optimiza...
It is known that existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may su...
How can we effectively exploit the collected samples when solving a continuous control task with Rei...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past,...
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy ...
We consider the transfer of experience samples in reinforcement learning. Most of the previous works...
Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this pape...
In order for reinforcement learning techniques to be useful in real-world decision making processes,...
A central challenge to applying many off-policy reinforcement learning algorithms to real world prob...
Gradient-based policy search is an alternative to value-function-based methods for reinforcement lea...
Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for curre...
This thesis considers three complications that arise from applying reinforcement learning to a real-...
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy ...
Stochastic processes are an important theoretical tool to model sequential phenomenon in the natural...
In importance sampling (IS)-based reinforcement learning algorithms such as Proximal Policy Optimiza...
It is known that existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may su...
How can we effectively exploit the collected samples when solving a continuous control task with Rei...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past,...
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy ...
We consider the transfer of experience samples in reinforcement learning. Most of the previous works...
Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this pape...
In order for reinforcement learning techniques to be useful in real-world decision making processes,...
A central challenge to applying many off-policy reinforcement learning algorithms to real world prob...
Gradient-based policy search is an alternative to value-function-based methods for reinforcement lea...
Learning in a lifelong setting, where the dynamics continually evolve, is a hard challenge for curre...
This thesis considers three complications that arise from applying reinforcement learning to a real-...
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy ...
Stochastic processes are an important theoretical tool to model sequential phenomenon in the natural...
In importance sampling (IS)-based reinforcement learning algorithms such as Proximal Policy Optimiza...
It is known that existing policy gradient methods (such as vanilla policy gradient, PPO, A2C) may su...