206 pagesRecent advances in reinforcement learning (RL) provide exciting potential for making agents learn, plan and act effectively in uncertain environments. Most existing algorithms in RL rely on known environments or the existence of a good simulator, where it is cheap to explore and collect the training data. However, this is not the case for human-centered interactive systems, in which online sampling or experimentation is costly, dangerous, or even illegal. This dissertation advocates an alternative data-driven approach that aims to evaluate and improve the performance of intelligent systems by only using the logged data from prior versions of the system (a.k.a. off-policy evaluation and learning). While such data is collected in lar...
This paper considers how to complement offline reinforcement learning (RL) data with additional data...
International audienceOffline Reinforcement Learning (RL) aims at learning an optimal control from a...
Consider an autonomous teacher agent trying to adaptively sequence material to best keep a student e...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Offline reinforcement learning involves training a decision-making agent based solely on historical ...
Off-policy policy evaluation (OPE) is the problem of estimating the online performance of a policy u...
The central question addressed in this research is ”can we define evaluation methodologies that enco...
Offline reinforcement learning (RL) have received rising interest due to its appealing data efficien...
Many reinforcement learning algorithms use trajectories collected from the execution of one or more ...
Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is imprac...
Learning from interaction with the environment -- trying untested actions, observing successes and f...
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden...
In many real-world reinforcement learning problems, we have access to an existing dataset and would ...
We present an initial study of off-policy evaluation (OPE), a problem prerequisite to real-world rei...
We present a model-based offline reinforcement learning policy performance lower bound that explicit...
This paper considers how to complement offline reinforcement learning (RL) data with additional data...
International audienceOffline Reinforcement Learning (RL) aims at learning an optimal control from a...
Consider an autonomous teacher agent trying to adaptively sequence material to best keep a student e...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Offline reinforcement learning involves training a decision-making agent based solely on historical ...
Off-policy policy evaluation (OPE) is the problem of estimating the online performance of a policy u...
The central question addressed in this research is ”can we define evaluation methodologies that enco...
Offline reinforcement learning (RL) have received rising interest due to its appealing data efficien...
Many reinforcement learning algorithms use trajectories collected from the execution of one or more ...
Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is imprac...
Learning from interaction with the environment -- trying untested actions, observing successes and f...
Eligibility traces have been shown to speed reinforcement learning, to make it more robust to hidden...
In many real-world reinforcement learning problems, we have access to an existing dataset and would ...
We present an initial study of off-policy evaluation (OPE), a problem prerequisite to real-world rei...
We present a model-based offline reinforcement learning policy performance lower bound that explicit...
This paper considers how to complement offline reinforcement learning (RL) data with additional data...
International audienceOffline Reinforcement Learning (RL) aims at learning an optimal control from a...
Consider an autonomous teacher agent trying to adaptively sequence material to best keep a student e...