We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying environments is particularly challenging because of the need to integrate the constraint violation reduction, safe exploration, and adaptation to the non-stationarity. To this end, we identify two alternative conditions on the time-varying c...
Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement...
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and ...
We introduce a model-free algorithm for learning in Markov decision processes with parameterized act...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider the problem of constrained Markov decision process (CMDP) in continuous state actions sp...
Reinforcement learning is widely used in applications where one needs to perform sequential decision...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision proces...
In reinforcement learning (RL), an agent must explore an initially unknown environment in order to l...
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and ...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...
Reinforcement learning in non-stationary environments is generally regarded as a very difficult prob...
We consider large-scale Markov decision processes with an unknown cost function and address the prob...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement...
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and ...
We introduce a model-free algorithm for learning in Markov decision processes with parameterized act...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider the problem of constrained Markov decision process (CMDP) in continuous state actions sp...
Reinforcement learning is widely used in applications where one needs to perform sequential decision...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision proces...
In reinforcement learning (RL), an agent must explore an initially unknown environment in order to l...
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and ...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...
Reinforcement learning in non-stationary environments is generally regarded as a very difficult prob...
We consider large-scale Markov decision processes with an unknown cost function and address the prob...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement...
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and ...
We introduce a model-free algorithm for learning in Markov decision processes with parameterized act...