In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and high-dimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present a method that uses trajectory optimization as a powerful exploration strat-egy that guides the policy search. A variational decomposition of a maximum likelihood policy objective allows us to use standard trajectory optimization al-gorithms such as differential dynamic programming, interleaved with standard supervised learning for the policy itself....
Computational agents often need to learn policies that involve many control variables, e.g., a robot...
This paper reviews a variety of ways to use trajectory optimization to accelerate dynamic programmin...
Search missions require motion planning and navigation methods for information gathering that contin...
In order to learn effective control policies for dynamical systems, policy search methods must be ab...
Abstract—Reinforcement learning and policy search methods can in principle solve a wide range of con...
Policy search methods can in principle learn controllers for a wide range of locomotion tasks automa...
Direct policy search methods offer the promise of automatically learning controllers for com-plex, h...
We present a policy search method that uses iteratively refitted local linear models to optimize tra...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
We consider the policy search approach to reinforcement learning. We show that if a “baseline distri...
Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain target dis...
Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of...
We present an Imitation Learning approach for the control of dynamical systems with a known model. ...
This paper presents a novel trajectory planning algo-rithm for nonlinear dynamical systems evolving ...
Computational agents often need to learn policies that involve many control variables, e.g., a robot...
This paper reviews a variety of ways to use trajectory optimization to accelerate dynamic programmin...
Search missions require motion planning and navigation methods for information gathering that contin...
In order to learn effective control policies for dynamical systems, policy search methods must be ab...
Abstract—Reinforcement learning and policy search methods can in principle solve a wide range of con...
Policy search methods can in principle learn controllers for a wide range of locomotion tasks automa...
Direct policy search methods offer the promise of automatically learning controllers for com-plex, h...
We present a policy search method that uses iteratively refitted local linear models to optimize tra...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Com...
We consider the policy search approach to reinforcement learning. We show that if a “baseline distri...
Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain target dis...
Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of...
We present an Imitation Learning approach for the control of dynamical systems with a known model. ...
This paper presents a novel trajectory planning algo-rithm for nonlinear dynamical systems evolving ...
Computational agents often need to learn policies that involve many control variables, e.g., a robot...
This paper reviews a variety of ways to use trajectory optimization to accelerate dynamic programmin...
Search missions require motion planning and navigation methods for information gathering that contin...