Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality guarantees. This paper overcomes the issues from the perspective of probabilistic inference. We introduce a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning: 1) a provable optimal non-parametric variational distribution could be computed in closed form after a convex optimization (E-step); 2) the policy parameter is improved within the trust region based on the optimal variational distribution (M-step). The proposed algorithm decomposes the safe RL problem i...
Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
Many physical systems have underlying safety considerations that require that the policy employed en...
In safe Reinforcement Learning (RL), the agent attempts to find policies which maximize the expectat...
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal poli...
In safe Reinforcement Learning (RL), the agent attempts to find policies which maximize the expectat...
We consider the safe reinforcement learning (RL) problem of maximizing utility with extremely low co...
This letter aims to solve a safe reinforcement learning (RL) problem with risk measure-based constra...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) ...
Safety exploration can be regarded as a constrained Markov decision problem where the expected long-...
Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward si...
To apply reinforcement learning (RL) to real-world applications, agents are required to adhere to th...
Reinforcement learning is an increasingly popular framework that enables robots to learn to perform ...
Model-based reinforcement learning algorithms have been shown to achieve successful results on vario...
Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
Many physical systems have underlying safety considerations that require that the policy employed en...
In safe Reinforcement Learning (RL), the agent attempts to find policies which maximize the expectat...
Reinforcement learning (RL) agents need to explore their environments in order to learn optimal poli...
In safe Reinforcement Learning (RL), the agent attempts to find policies which maximize the expectat...
We consider the safe reinforcement learning (RL) problem of maximizing utility with extremely low co...
This letter aims to solve a safe reinforcement learning (RL) problem with risk measure-based constra...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) ...
Safety exploration can be regarded as a constrained Markov decision problem where the expected long-...
Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward si...
To apply reinforcement learning (RL) to real-world applications, agents are required to adhere to th...
Reinforcement learning is an increasingly popular framework that enables robots to learn to perform ...
Model-based reinforcement learning algorithms have been shown to achieve successful results on vario...
Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
Many physical systems have underlying safety considerations that require that the policy employed en...