Offline reinforcement learning (RL) defines the task of learning from a static logged dataset without continually interacting with the environment. The distribution shift between the learned policy and the behavior policy makes it necessary for the value function to stay conservative such that out-of-distribution (OOD) actions will not be severely overestimated. However, existing approaches, penalizing the unseen actions or regularizing with the behavior policy, are too pessimistic, which suppresses the generalization of the value function and hinders the performance improvement. This paper explores mild but enough conservatism for offline learning while not harming generalization. We propose Mildly Conservative Q-learning (MCQ), where OOD ...
Offline reinforcement learning (RL) enables effective learning from previously collected data withou...
The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavio...
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy ...
Offline reinforcement learning (RL) promises the ability to learn effective policies solely using ex...
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that...
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distrib...
Offline reinforcement learning (RL) provides a promising direction to exploit the massive amount of ...
Offline reinforcement learning (RL) aims to learn policy from the passively collected offline datase...
Offline reinforcement learning algorithms still lack trust in practice due to the risk that the lear...
In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent wi...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies f...
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for ...
In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumu...
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learni...
Offline reinforcement learning (RL) enables effective learning from previously collected data withou...
The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavio...
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy ...
Offline reinforcement learning (RL) promises the ability to learn effective policies solely using ex...
We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that...
Offline multi-agent reinforcement learning is challenging due to the coupling effect of both distrib...
Offline reinforcement learning (RL) provides a promising direction to exploit the massive amount of ...
Offline reinforcement learning (RL) aims to learn policy from the passively collected offline datase...
Offline reinforcement learning algorithms still lack trust in practice due to the risk that the lear...
In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent wi...
In this dissertation we develop new methodologies and frameworks to address challenges in offline re...
Offline reinforcement learning (RL) presents a promising approach for learning reinforced policies f...
Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for ...
In offline reinforcement learning (RL), one detrimental issue to policy learning is the error accumu...
Offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learni...
Offline reinforcement learning (RL) enables effective learning from previously collected data withou...
The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavio...
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy ...