Abstract—In this paper we address the reliability of policies derived by Reinforcement Learning on a limited amount of observations. This can be done in a principled manner by taking into account the derived Q-function’s uncertainty, which stems from the uncertainty of the estimators used for the MDP’s transition probabilities and the reward function. We apply uncertainty propagation parallelly to the Bellman iteration and achieve confidence intervals for the Q-function. In a second step we change the Bellman operator as to achieve a policy guaranteeing the highest minimum performance with a given probability. We demonstrate the functionality of our method on artificial examples and show that, for an important problem class even an enhancem...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Offline reinforcement learning, or learning from a fixed data set, is an attractive alternative to o...
none3noHow does the uncertainty of the value function propagate when performing temporal difference ...
Abstract. Reinforcement learning aims to derive an optimal pol-icy for an often initially unknown en...
Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-fr...
We consider the model-based reinforcement learning framework where we are interested in learning a m...
We consider the model-based reinforcement learning framework where we are interested in learning a m...
Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning ...
Handling uncertainty is an important part of decision-making. Leveraging uncertainty for guiding exp...
Reinforcement learning (RL) is recognized as lacking generalization and robustness under environment...
In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent wi...
In order for reinforcement learning techniques to be useful in real-world decision making processes,...
Decision theory addresses the task of choosing an action; it provides robust decision-making criteri...
In online reinforcement learning, operators predict the return by weighting the successors’ estimate...
Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning ...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Offline reinforcement learning, or learning from a fixed data set, is an attractive alternative to o...
none3noHow does the uncertainty of the value function propagate when performing temporal difference ...
Abstract. Reinforcement learning aims to derive an optimal pol-icy for an often initially unknown en...
Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-fr...
We consider the model-based reinforcement learning framework where we are interested in learning a m...
We consider the model-based reinforcement learning framework where we are interested in learning a m...
Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning ...
Handling uncertainty is an important part of decision-making. Leveraging uncertainty for guiding exp...
Reinforcement learning (RL) is recognized as lacking generalization and robustness under environment...
In many Reinforcement Learning (RL) tasks, the classical online interaction of the learning agent wi...
In order for reinforcement learning techniques to be useful in real-world decision making processes,...
Decision theory addresses the task of choosing an action; it provides robust decision-making criteri...
In online reinforcement learning, operators predict the return by weighting the successors’ estimate...
Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning ...
Reinforcement learning is a general technique that allows an agent to learn an optimal policy and in...
Offline reinforcement learning, or learning from a fixed data set, is an attractive alternative to o...
none3noHow does the uncertainty of the value function propagate when performing temporal difference ...