The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited. Policies are learned from a given dataset, which solely determines their performance. Despite this fact, how dataset characteristics influence Offline RL algorithms is still hardly investigated. The dataset characteristics are determined by the behavioral policy that samples this dataset. Therefore, we define characteristics of behavioral policies as exploratory for yielding high expected information in their interaction with the Markov Decision Process (MDP) and as exploitative for having high expected re...
We consider the offline reinforcement learning problem, where the aim is to learn a decision making ...
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based an...
Learning policies from previously recorded data is a promising direction for real-world robotics tas...
In some applications of reinforcement learning, a dataset of pre-collected experience is already ava...
In some applications of reinforcement learning, a dataset of pre-collected experience is already ava...
Existing offline reinforcement learning (RL) algorithms typically assume that training data is eithe...
Offline reinforcement learning enables learning from a fixed dataset, without further interactions w...
Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies f...
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for ...
The ability to discover optimal behaviour from fixed data sets has the potential to transfer the suc...
We present state advantage weighting for offline reinforcement learning (RL). In contrast to action ...
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learni...
Offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-maki...
Model-based offline reinforcement learning (RL), which builds a supervised transition model with log...
In many real-world applications, collecting large and high-quality datasets may be too costly or imp...
We consider the offline reinforcement learning problem, where the aim is to learn a decision making ...
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based an...
Learning policies from previously recorded data is a promising direction for real-world robotics tas...
In some applications of reinforcement learning, a dataset of pre-collected experience is already ava...
In some applications of reinforcement learning, a dataset of pre-collected experience is already ava...
Existing offline reinforcement learning (RL) algorithms typically assume that training data is eithe...
Offline reinforcement learning enables learning from a fixed dataset, without further interactions w...
Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies f...
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for ...
The ability to discover optimal behaviour from fixed data sets has the potential to transfer the suc...
We present state advantage weighting for offline reinforcement learning (RL). In contrast to action ...
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learni...
Offline reinforcement learning (RL) concerns pursuing an optimal policy for sequential decision-maki...
Model-based offline reinforcement learning (RL), which builds a supervised transition model with log...
In many real-world applications, collecting large and high-quality datasets may be too costly or imp...
We consider the offline reinforcement learning problem, where the aim is to learn a decision making ...
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based an...
Learning policies from previously recorded data is a promising direction for real-world robotics tas...