Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have proposed common workarounds based on the pre-training of state representations, followed by policy training. In this work, we introduce a simple, yet effective approach for learning state representations. Our method, Behavior Prior Representation (BPR), learns state representations with an easy-to-integrate objective based on behavior cloning of the dataset: we first learn a state representation by mimicking actions from the dataset, and then train a policy on top of the fixed representation, using any off-the-shelf Offline RL algorithm. Theoretically, ...
The ability to discover optimal behaviour from fixed data sets has the potential to transfer the suc...
Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collec...
Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by...
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for ...
The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavio...
Reinforcement Learning (RL) agents can solve general problems based on little to no knowledge of the...
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforceme...
Reinforcement Learning (RL) agents can solve general problems based on little to no knowledge of the...
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky ...
Offline reinforcement learning enables learning from a fixed dataset, without further interactions w...
We present a model-based offline reinforcement learning policy performance lower bound that explicit...
Pessimism is of great importance in offline reinforcement learning (RL). One broad category of offli...
We present state advantage weighting for offline reinforcement learning (RL). In contrast to action ...
Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is imprac...
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, withou...
The ability to discover optimal behaviour from fixed data sets has the potential to transfer the suc...
Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collec...
Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by...
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for ...
The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavio...
Reinforcement Learning (RL) agents can solve general problems based on little to no knowledge of the...
We propose a new model-based offline RL framework, called Adversarial Models for Offline Reinforceme...
Reinforcement Learning (RL) agents can solve general problems based on little to no knowledge of the...
The application of Reinforcement Learning (RL) in real world environments can be expensive or risky ...
Offline reinforcement learning enables learning from a fixed dataset, without further interactions w...
We present a model-based offline reinforcement learning policy performance lower bound that explicit...
Pessimism is of great importance in offline reinforcement learning (RL). One broad category of offli...
We present state advantage weighting for offline reinforcement learning (RL). In contrast to action ...
Conventional reinforcement learning (RL) needs an environment to collect fresh data, which is imprac...
Offline Reinforcement Learning (RL) aims at learning an optimal control from a fixed dataset, withou...
The ability to discover optimal behaviour from fixed data sets has the potential to transfer the suc...
Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously collec...
Offline reinforcement learning (RL) methods strike a balance between exploration and exploitation by...