We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with provable global convergence in learning the optimal linear estimator designs, i.e., the Kalman filter (KF). Notably, the RHPG algorithm does not require any prior knowledge of the system for initialization and does not require the target system to be open-loop stable. The key of RHPG is that we integrate vanilla PG (or any other policy search directions) into a dynamic programming outer loop, which iteratively decomposes the infinite-horizon KF problem that is constrained and non-convex in the policy parameter into a sequence of static estimation problems that are unconstrained and strongly-convex, thus enabling global convergence. We further pro...
International audiencePolicy search is a method for approximately solving an optimal control problem...
International audienceWe study the problem of receding horizon control for stochastic discrete-time ...
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex ...
We study the global linear convergence of policy gradient (PG) methods for finite-horizon explorator...
Despite its popularity in the reinforcement learning community, a provably convergent policy gradien...
While the optimization landscape of policy gradient methods has been recently investigated for parti...
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of...
We consider the problem of designing sample efficient learning algorithms for infinite horizon disco...
Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to...
Gradient-based methods have been widely used for system design and optimization in diverse applicati...
Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and ar...
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the develop...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Disting...
International audiencePolicy search is a method for approximately solving an optimal control problem...
International audienceWe study the problem of receding horizon control for stochastic discrete-time ...
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex ...
We study the global linear convergence of policy gradient (PG) methods for finite-horizon explorator...
Despite its popularity in the reinforcement learning community, a provably convergent policy gradien...
While the optimization landscape of policy gradient methods has been recently investigated for parti...
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of...
We consider the problem of designing sample efficient learning algorithms for infinite horizon disco...
Policy gradient (PG) methods are popular and efficient for large-scale reinforcement learning due to...
Gradient-based methods have been widely used for system design and optimization in diverse applicati...
Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and ar...
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the develop...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Disting...
International audiencePolicy search is a method for approximately solving an optimal control problem...
International audienceWe study the problem of receding horizon control for stochastic discrete-time ...
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in non-convex ...