We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited ...
www.cs.tu-berlin.de\∼geibel Abstract. In this article, I will consider Markov Decision Processes wit...
Markov decision processes (MDP) is a standard modeling tool for sequential decision making in a dyna...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the pre...
In environments with uncertain dynamics ex-ploration is necessary to learn how to per-form well. Exi...
This paper concerns the efficient construction of a safety shield for reinforcement learning. We spe...
In reinforcement learning (RL), an agent must explore an initially unknown environment in order to l...
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are thos...
When exploring an unknown environment, a mobile robot must decide where to observe next. It must do ...
Many physical systems have underlying safety considerations that require that the policy employed en...
We consider sequential decision problems under uncertainty, where we seek to optimize an unknown fun...
Abstract — Reinforcement learning for robotic applications faces the challenge of constraint satisfa...
Often the most practical way to define a Markov Decision Process (MDP) is as a simulator that, given...
AbstractActing in domains where an agent must plan several steps ahead to achieve a goal can be a ch...
Replicating the human ability to solve complex planning problems based on minimal prior knowledge ha...
www.cs.tu-berlin.de\∼geibel Abstract. In this article, I will consider Markov Decision Processes wit...
Markov decision processes (MDP) is a standard modeling tool for sequential decision making in a dyna...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the pre...
In environments with uncertain dynamics ex-ploration is necessary to learn how to per-form well. Exi...
This paper concerns the efficient construction of a safety shield for reinforcement learning. We spe...
In reinforcement learning (RL), an agent must explore an initially unknown environment in order to l...
In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are thos...
When exploring an unknown environment, a mobile robot must decide where to observe next. It must do ...
Many physical systems have underlying safety considerations that require that the policy employed en...
We consider sequential decision problems under uncertainty, where we seek to optimize an unknown fun...
Abstract — Reinforcement learning for robotic applications faces the challenge of constraint satisfa...
Often the most practical way to define a Markov Decision Process (MDP) is as a simulator that, given...
AbstractActing in domains where an agent must plan several steps ahead to achieve a goal can be a ch...
Replicating the human ability to solve complex planning problems based on minimal prior knowledge ha...
www.cs.tu-berlin.de\∼geibel Abstract. In this article, I will consider Markov Decision Processes wit...
Markov decision processes (MDP) is a standard modeling tool for sequential decision making in a dyna...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...