Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics
Embedding an optimization process has been explored for imposing efficient and flexible policy struc...
Recent advances in machine learning, simulation, algorithm design, and computer hardware have allowe...
Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning ...
Many of the recent trajectory optimization algorithms alternate between linear approximation of the...
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages...
Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of...
This paper presents a new problem solving approach that is able to generate optimal policy solution ...
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...
This is the version of record. It originally appeared on arXiv at http://arxiv.org/abs/1603.00748.Mo...
Iterative linear quadratic regulator (iLQR) has gained wide popularity in addressing trajectory opti...
Reinforcement learning (RL) has demonstrated impressive performance in various domains such as video...
In this work, we study model-based reinforcement learning (RL) in unknown stabilizable linear dynami...
Deep reinforcement learning uses simulators as abstract oracles to interact with the environment. In...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Embedding an optimization process has been explored for imposing efficient and flexible policy struc...
Recent advances in machine learning, simulation, algorithm design, and computer hardware have allowe...
Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning ...
Many of the recent trajectory optimization algorithms alternate between linear approximation of the...
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages...
Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of...
This paper presents a new problem solving approach that is able to generate optimal policy solution ...
Reinforcement learning (RL) and trajectory optimization (TO) present strong complementary advantages...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...
This is the version of record. It originally appeared on arXiv at http://arxiv.org/abs/1603.00748.Mo...
Iterative linear quadratic regulator (iLQR) has gained wide popularity in addressing trajectory opti...
Reinforcement learning (RL) has demonstrated impressive performance in various domains such as video...
In this work, we study model-based reinforcement learning (RL) in unknown stabilizable linear dynami...
Deep reinforcement learning uses simulators as abstract oracles to interact with the environment. In...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Embedding an optimization process has been explored for imposing efficient and flexible policy struc...
Recent advances in machine learning, simulation, algorithm design, and computer hardware have allowe...
Model-free reinforcement learning based methods such as Proximal Policy Optimization, or Q-learning ...