We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves $\mathcal{O}({\epsilon^{-2}})$ sample complexity and $\mathcal{O}(\epsilon^{-1})$ iteration complexity with general parameterization where $\epsilon$ defines the optimality error. This improves the state-of-the-art sample complexity by a $\log(\frac{1}{\epsilon})$ factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of impo...
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic varian...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...
In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Disting...
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the develop...
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms ...
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex ...
Partially observable Markov decision processes are interesting because of their ability to model mos...
We provide a natural gradient method that represents the steepest descent direction based on the und...
We settle the sample complexity of policy learning for the maximization of the long run average rewa...
We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with prov...
We study the problem of policy optimization for infinite-horizon discounted Markov Decision Process...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic varian...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...
In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Disting...
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the develop...
Natural policy gradient (NPG) methods are among the most widely used policy optimization algorithms ...
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex ...
Partially observable Markov decision processes are interesting because of their ability to model mos...
We provide a natural gradient method that represents the steepest descent direction based on the und...
We settle the sample complexity of policy learning for the maximization of the long run average rewa...
We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with prov...
We study the problem of policy optimization for infinite-horizon discounted Markov Decision Process...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic varian...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...