Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one can compensate for a constraint violation in one episode with a strict constraint satisfaction in another. However, we do not consider such a behavior safe in practical applications. In this paper, we overcome this weakness by proposing a novel model-based dual algorithm OptAug-CMDP for tabular finite-horizon CMDPs. Our algorithm is motivated by t...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
International audienceWe consider the problem of online reinforcement learning when several state re...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...
Incorporating safety is an essential prerequisite for broadening the practical applications of reinf...
We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with ...
This paper considers the best policy identification (BPI) problem in online Constrained Markov Decis...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem...
Safe reinforcement learning is extremely challenging. Not only must the agent explore an unknown env...
International audienceWe study the role of the representation of state-action value functions in reg...
We consider the problem of constrained Markov decision process (CMDP) in continuous state actions sp...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
Safety exploration can be regarded as a constrained Markov decision problem where the expected long-...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
International audienceWe consider the problem of online reinforcement learning when several state re...
We address the issue of safety in reinforcement learning. We pose the problem in an episodic framewo...
Incorporating safety is an essential prerequisite for broadening the practical applications of reinf...
We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with ...
This paper considers the best policy identification (BPI) problem in online Constrained Markov Decis...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
We consider a discounted cost constrained Markov decision process (CMDP) policy optimization problem...
Safe reinforcement learning is extremely challenging. Not only must the agent explore an unknown env...
International audienceWe study the role of the representation of state-action value functions in reg...
We consider the problem of constrained Markov decision process (CMDP) in continuous state actions sp...
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision pr...
Safety exploration can be regarded as a constrained Markov decision problem where the expected long-...
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suer ...
We study online reinforcement learning in linear Markov decision processes with adversarial losses a...
International audienceWe consider the problem of online reinforcement learning when several state re...