The goal in offline data-driven decision-making is synthesize decisions that optimize a black-box utility function, using a previously-collected static dataset, with no active interaction. These problems appear in many forms: offline reinforcement learning (RL), where we must produce actions that optimize the long-term reward, bandits from logged data, where the goal is to determine the correct arm, and offline model-based optimization (MBO) problems, where we must find the optimal design provided access to only a static dataset. A key challenge in all these settings is distributional shift: when we optimize with respect to the input into a model trained from offline data, it is easy to produce an out-of-distribution (OOD) input that appear...
Offline reinforcement learning (RL) promises the ability to learn effective policies solely using ex...
In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent wi...
Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fix...
A common use case of machine learning in real world settings is to learn a model from historical dat...
Existing offline reinforcement learning (RL) algorithms typically assume that training data is eithe...
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky ...
Model-based offline reinforcement learning (RL), which builds a supervised transition model with log...
Offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task t...
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learni...
Interacting with the actual environment to acquire data is often costly and time-consuming in roboti...
In offline RL, constraining the learned policy to remain close to the data is essential to prevent t...
Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without a...
The ability to discover optimal behaviour from fixed data sets has the potential to transfer the suc...
In some applications of reinforcement learning, a dataset of pre-collected experience is already ava...
Offline reinforcement learning involves training a decision-making agent based solely on historical ...
Offline reinforcement learning (RL) promises the ability to learn effective policies solely using ex...
In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent wi...
Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fix...
A common use case of machine learning in real world settings is to learn a model from historical dat...
Existing offline reinforcement learning (RL) algorithms typically assume that training data is eithe...
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky ...
Model-based offline reinforcement learning (RL), which builds a supervised transition model with log...
Offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task t...
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learni...
Interacting with the actual environment to acquire data is often costly and time-consuming in roboti...
In offline RL, constraining the learned policy to remain close to the data is essential to prevent t...
Offline reinforcement learning (RL) optimizes the policy on a previously collected dataset without a...
The ability to discover optimal behaviour from fixed data sets has the potential to transfer the suc...
In some applications of reinforcement learning, a dataset of pre-collected experience is already ava...
Offline reinforcement learning involves training a decision-making agent based solely on historical ...
Offline reinforcement learning (RL) promises the ability to learn effective policies solely using ex...
In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent wi...
Offline reinforcement learning (RL) addresses the problem of learning a performant policy from a fix...