We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments. In this problem, the reward/utility functions and the state transition functions are both allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain known variation budgets. Designing safe RL algorithms in time-varying environments is particularly challenging because of the need to integrate the constraint violation reduction, safe exploration, and adaptation to the non-stationarity. To this end, we identify two alternative conditions on the time-varying c...
Reinforcement learning in non-stationary environments is generally regarded as a very difficult prob...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and ...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider the problem of constrained Markov decision process (CMDP) in continuous state actions sp...
Reinforcement learning is widely used in applications where one needs to perform sequential decision...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision proces...
In reinforcement learning (RL), an agent must explore an initially unknown environment in order to l...
We consider large-scale Markov decision processes with an unknown cost function and address the prob...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
The paper investigates the possibility of applying value function based reinforcement learn-ing (RL)...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...
We introduce a model-free algorithm for learning in Markov decision processes with parameterized act...
Reinforcement learning in non-stationary environments is generally regarded as a very difficult prob...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and ...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider the problem of constrained Markov decision process (CMDP) in continuous state actions sp...
Reinforcement learning is widely used in applications where one needs to perform sequential decision...
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic no...
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under dri...
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision proces...
In reinforcement learning (RL), an agent must explore an initially unknown environment in order to l...
We consider large-scale Markov decision processes with an unknown cost function and address the prob...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
The paper investigates the possibility of applying value function based reinforcement learn-ing (RL)...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...
We introduce a model-free algorithm for learning in Markov decision processes with parameterized act...
Reinforcement learning in non-stationary environments is generally regarded as a very difficult prob...
Reinforcement learning (RL) has emerged as a general-purpose technique for addressing problems invol...
We study convex Constrained Markov Decision Processes (CMDPs) in which the objective is concave and ...