We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims to maximize the long-term average reward while simultaneously keeping the $K$ long-term average costs lower than a certain threshold. In this paper, we propose CMDP-PSRL, a posterior sampling based algorithm using which the agent can learn optimal policies to interact with the CMDP. Further, for MDP with $S$ states, $A$ actions, and diameter $D$, we prove that following CMDP-PSRL algorithm, the agent can bound the regret of not accumulating rewards from optimal policy by $\Tilde{O}(poly(DSA)\sqrt{T})$. Fu...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
In these notes we will tackle the problem of finding optimal policies for Markov decision processes ...
I $\mathrm{G} $ (Youqiang HUANG) Constrained Markov decision processes with compact state and action...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Pro...
We settle the sample complexity of policy learning for the maximization of the long run average rewa...
This paper considers the best policy identification (BPI) problem in online Constrained Markov Decis...
We consider an MDP setting in which the reward function is allowed to change during each time step o...
Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement...
International audienceWe consider reinforcement learning in a discrete, undiscounted, infinite-horiz...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average ...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
In these notes we will tackle the problem of finding optimal policies for Markov decision processes ...
I $\mathrm{G} $ (Youqiang HUANG) Constrained Markov decision processes with compact state and action...
International audienceWe consider an agent interacting with an environment in a single stream of act...
We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Pro...
We settle the sample complexity of policy learning for the maximization of the long run average rewa...
This paper considers the best policy identification (BPI) problem in online Constrained Markov Decis...
We consider an MDP setting in which the reward function is allowed to change during each time step o...
Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement...
International audienceWe consider reinforcement learning in a discrete, undiscounted, infinite-horiz...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average ...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
In these notes we will tackle the problem of finding optimal policies for Markov decision processes ...
I $\mathrm{G} $ (Youqiang HUANG) Constrained Markov decision processes with compact state and action...