Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy that is different from the currently optimized policy. A common approach is to use importance sampling techniques for compensating for the bias of value function estimators caused by the difference between the data-sampling policy and the target policy. However, existing off-policy methods often do not take the variance of the value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining ...
This thesis considers three complications that arise from applying reinforcement learning to a real-...
206 pagesRecent advances in reinforcement learning (RL) provide exciting potential for making agents...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy ...
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy ...
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past,...
A central challenge to applying many off-policy reinforcement learning algorithms to real world prob...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
How can we effectively exploit the collected samples when solving a continuous control task with Rei...
Offline reinforcement learning involves training a decision-making agent based solely on historical ...
Abstract In this paper we analyze a particular issue of estimation, namely the estimation of the exp...
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goa...
Importance sampling is often used in machine learning when training and testing data come from diffe...
Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimatio...
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden...
This thesis considers three complications that arise from applying reinforcement learning to a real-...
206 pagesRecent advances in reinforcement learning (RL) provide exciting potential for making agents...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy ...
Off-policy reinforcement learning is aimed at efficiently using data samples gathered from a policy ...
Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past,...
A central challenge to applying many off-policy reinforcement learning algorithms to real world prob...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
How can we effectively exploit the collected samples when solving a continuous control task with Rei...
Offline reinforcement learning involves training a decision-making agent based solely on historical ...
Abstract In this paper we analyze a particular issue of estimation, namely the estimation of the exp...
We consider the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goa...
Importance sampling is often used in machine learning when training and testing data come from diffe...
Importance Sampling (IS) is a widely used building block for a large variety of off-policy estimatio...
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden...
This thesis considers three complications that arise from applying reinforcement learning to a real-...
206 pagesRecent advances in reinforcement learning (RL) provide exciting potential for making agents...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...