This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this paper, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off-policy value estimator that is robust to potential model misspecification and provide rigorous uncertaint...
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative rew...
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
This paper is concerned with constructing a confidence interval for a target policy’s value offline ...
This article is concerned with constructing a confidence interval for a target policy’s value offlin...
Off-policy evaluation learns a target policy’s value with a historical dataset generated by a differ...
Offline reinforcement learning involves training a decision-making agent based solely on historical ...
We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the l...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement...
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), ...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), ...
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative rew...
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
This paper is concerned with constructing a confidence interval for a target policy’s value offline ...
This article is concerned with constructing a confidence interval for a target policy’s value offlin...
Off-policy evaluation learns a target policy’s value with a historical dataset generated by a differ...
Offline reinforcement learning involves training a decision-making agent based solely on historical ...
We study the offline reinforcement learning (RL) in the face of unmeasured confounders. Due to the l...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement...
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), ...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), ...
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative rew...
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...