We study the global linear convergence of policy gradient (PG) methods for finite-horizon exploratory linear-quadratic control (LQC) problems. The setting includes stochastic LQC problems with indefinite costs and allows additional entropy regularisers in the objective. We consider a continuous-time Gaussian policy whose mean is linear in the state variable and whose covariance is state-independent. Contrary to discrete-time problems, the cost is noncoercive in the policy and not all descent directions lead to bounded iterates. We propose geometry-aware gradient descents for the mean and covariance of the policy using the Fisher geometry and the Bures-Wasserstein geometry, respectively. The policy iterates are shown to satisfy an a-priori b...
The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for...
43 pages, 2 tablesInternational audienceLearning in stochastic games is a notoriously difficult prob...
This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processe...
Despite its popularity in the reinforcement learning community, a provably convergent policy gradien...
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic reg...
We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with prov...
This paper investigates an infinite-horizon linear quadratic stochastic (LQS) optimal control proble...
While the optimization landscape of policy gradient methods has been recently investigated for parti...
The linear quadratic framework is widely studied in the literature on stochastic control and game th...
International audiencePolicy search is a method for approximately solving an optimal control problem...
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of...
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the develop...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
We study the distributed Linear Quadratic Gaussian (LQG) control problem in discrete-time and finite...
Thesis (Ph.D.)--University of Washington, 2020In this thesis, we shall study optimal control problem...
The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for...
43 pages, 2 tablesInternational audienceLearning in stochastic games is a notoriously difficult prob...
This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processe...
Despite its popularity in the reinforcement learning community, a provably convergent policy gradien...
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic reg...
We introduce the receding-horizon policy gradient (RHPG) algorithm, the first PG algorithm with prov...
This paper investigates an infinite-horizon linear quadratic stochastic (LQS) optimal control proble...
While the optimization landscape of policy gradient methods has been recently investigated for parti...
The linear quadratic framework is widely studied in the literature on stochastic control and game th...
International audiencePolicy search is a method for approximately solving an optimal control problem...
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of...
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the develop...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
We study the distributed Linear Quadratic Gaussian (LQG) control problem in discrete-time and finite...
Thesis (Ph.D.)--University of Washington, 2020In this thesis, we shall study optimal control problem...
The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for...
43 pages, 2 tablesInternational audienceLearning in stochastic games is a notoriously difficult prob...
This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processe...