We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary distribution. Based on the fixed-point formulation of quasi-stationary distribution, we minimize the KL-divergence of two Markovian path distributions induced by candidate distribution and true target distribution. To solve this challenging minimization problem by gradient descent, we apply a reinforcement learning technique by introducing the reward and value functions. We derive the corresponding policy gradient theorem and design an actor-critic algorithm to learn the optimal solution and the value function. The numerical examples of finite state Markov chain are tested to demonstrate the new method
We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state...
In this paper we consider the problem of pol-icy evaluation in reinforcement learning, i.e., learnin...
In reinforcement learning (RL), an agent interacts with the environment by taking actions and observ...
We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary dist...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
We introduce a class of Markov decision problems (MDPs) which greatly simplify Reinforcement Learnin...
In this work we continue to build upon recent advances in reinforcement learning for finite Markov p...
This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision ...
This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision ...
This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision ...
Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explic...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explic...
We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state...
In this paper we consider the problem of pol-icy evaluation in reinforcement learning, i.e., learnin...
We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state...
In this paper we consider the problem of pol-icy evaluation in reinforcement learning, i.e., learnin...
In reinforcement learning (RL), an agent interacts with the environment by taking actions and observ...
We propose a reinforcement learning (RL) approach to compute the expression of quasi-stationary dist...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
We introduce a class of Markov decision problems (MDPs) which greatly simplify Reinforcement Learnin...
In this work we continue to build upon recent advances in reinforcement learning for finite Markov p...
This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision ...
This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision ...
This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision ...
Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explic...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
Most conventional policy gradient reinforcement learning (PGRL) algorithms neglect (or do not explic...
We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state...
In this paper we consider the problem of pol-icy evaluation in reinforcement learning, i.e., learnin...
We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state...
In this paper we consider the problem of pol-icy evaluation in reinforcement learning, i.e., learnin...
In reinforcement learning (RL), an agent interacts with the environment by taking actions and observ...