We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as inexact versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.Comment: This version ...
International audienceWe present four new reinforcement learning algorithms based on actor-critic, f...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
Despite its popularity in the reinforcement learning community, a provably convergent policy gradien...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms ...
We consider the problem of designing sample efficient learning algorithms for infinite horizon disco...
Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and ...
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex ...
We study the global linear convergence of policy gradient (PG) methods for finite-horizon explorator...
We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with prov...
This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processe...
We study the global convergence of policy gradient for infinite-horizon, continuous state and action...
Policy gradient methods have been frequently applied to problems in control and reinforcement learni...
We provide a natural gradient method that represents the steepest descent direction based on the und...
Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and ar...
International audienceWe present four new reinforcement learning algorithms based on actor-critic, f...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
Despite its popularity in the reinforcement learning community, a provably convergent policy gradien...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms ...
We consider the problem of designing sample efficient learning algorithms for infinite horizon disco...
Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and ...
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex ...
We study the global linear convergence of policy gradient (PG) methods for finite-horizon explorator...
We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with prov...
This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processe...
We study the global convergence of policy gradient for infinite-horizon, continuous state and action...
Policy gradient methods have been frequently applied to problems in control and reinforcement learni...
We provide a natural gradient method that represents the steepest descent direction based on the und...
Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and ar...
International audienceWe present four new reinforcement learning algorithms based on actor-critic, f...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
Despite its popularity in the reinforcement learning community, a provably convergent policy gradien...