This paper studies the problem of data collection for policy evaluation in Markov decision processes (MDPs). In policy evaluation, we are given a target policy and asked to estimate the expected cumulative reward it will obtain in an environment formalized as an MDP. We develop theory for optimal data collection within the class of tree-structured MDPs by first deriving an oracle data collection strategy that uses knowledge of the variance of the reward distributions. We then introduce the Reduced Variance Sampling (ReVar) algorithm that approximates the oracle strategy when the reward variances are unknown a priori and bound its sub-optimality compared to the oracle strategy. Finally, we empirically validate that ReVar leads to policy eval...
We study algorithms using randomized value functions for exploration in reinforcement learning. This...
How can we effectively exploit the collected samples when solving a continuous control task with Rei...
International audienceIn this paper, we propose a novel reinforcement-learning algorithm consisting ...
A central challenge to applying many off-policy reinforcement learning algorithms to real world prob...
In control systems theory, the Markov decision process (MDP) is a widely used optimization model inv...
For reinforcement learning on complex stochastic systems where many factors dynamically impact the o...
The evaluation of rare but high-stakes events remains one of the main difficulties in obtaining reli...
Marginalized importance sampling (MIS), which measures the density ratio between the state-action oc...
Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solvi...
The intersection of causal inference and machine learning for decision-making is rapidly expanding, ...
Policy gradient algorithms in reinforcement learning optimize the policy directly and rely on effici...
A. Choices were more random for more noisy reward distributions (i.e. high values of βj) and for mea...
Off-policy methods are the basis of a large number of effective Policy Optimization (PO) algorithms....
Abstract. Reinforcement learning means finding the optimal course of action in Markovian environment...
This thesis is devoted to designing and analyzing statistical decision rules to improve public polic...
We study algorithms using randomized value functions for exploration in reinforcement learning. This...
How can we effectively exploit the collected samples when solving a continuous control task with Rei...
International audienceIn this paper, we propose a novel reinforcement-learning algorithm consisting ...
A central challenge to applying many off-policy reinforcement learning algorithms to real world prob...
In control systems theory, the Markov decision process (MDP) is a widely used optimization model inv...
For reinforcement learning on complex stochastic systems where many factors dynamically impact the o...
The evaluation of rare but high-stakes events remains one of the main difficulties in obtaining reli...
Marginalized importance sampling (MIS), which measures the density ratio between the state-action oc...
Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solvi...
The intersection of causal inference and machine learning for decision-making is rapidly expanding, ...
Policy gradient algorithms in reinforcement learning optimize the policy directly and rely on effici...
A. Choices were more random for more noisy reward distributions (i.e. high values of βj) and for mea...
Off-policy methods are the basis of a large number of effective Policy Optimization (PO) algorithms....
Abstract. Reinforcement learning means finding the optimal course of action in Markovian environment...
This thesis is devoted to designing and analyzing statistical decision rules to improve public polic...
We study algorithms using randomized value functions for exploration in reinforcement learning. This...
How can we effectively exploit the collected samples when solving a continuous control task with Rei...
International audienceIn this paper, we propose a novel reinforcement-learning algorithm consisting ...