We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), where the evaluation policy depends only on observable variables and the behavior policy depends on unobservable latent variables. Existing works either assume no unmeasured confounders, or focus on settings where both the observation and the state spaces are tabular. In this work, we first propose novel identification methods for OPE in POMDPs with latent confounders, by introducing bridge functions that link the target policy’s value and the observed data distribution. We next propose minimax estimation methods for learning these bridge functions, and construct three estimators based on these estimated bridge functions, corresponding to a v...
Partially observable Markov decision processes (pomdp's) model decision problems in which an a...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement...
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), ...
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...
This paper studies the off-policy evaluation problem, where one aims to estimate the value of a targ...
Partially observable Markov decision processes (POMDPs) are interesting because they provide a gener...
AbstractThis study extends the framework of partially observable Markov decision processes (POMDPs) ...
The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligenc...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
This paper studies the off-policy evaluation prob-lem, where one aims to estimate the value of a tar...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning contr...
Partially observable Markov decision processes (pomdp's) model decision problems in which an a...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement...
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes (POMDPs), ...
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
We study offline reinforcement learning (RL) in partially observable Markov decision processes. In p...
This paper studies the off-policy evaluation problem, where one aims to estimate the value of a targ...
Partially observable Markov decision processes (POMDPs) are interesting because they provide a gener...
AbstractThis study extends the framework of partially observable Markov decision processes (POMDPs) ...
The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligenc...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
This paper studies the off-policy evaluation prob-lem, where one aims to estimate the value of a tar...
In a partially observable Markov decision process (POMDP), if the reward can be observed at each ste...
Reinforcement learning (RL) algorithms provide a sound theoretical basis for building learning contr...
Partially observable Markov decision processes (pomdp's) model decision problems in which an a...
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning dom...
Offline policy evaluation (OPE) is considered a fundamental and challenging problem in reinforcement...