State-Regularized Policy Search for Linearized Dynamical Systems

Abdulsamad, Hany
Arenz, Oleg
Peters, Jan
Neumann, Gerhard

Open link

Publication date

June 2017

DOI

10.1609/icaps.v27i1.13853

Publisher

Association for the Advancement of Artificial Intelligence

Abstract

Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of feedback-controllers by taking advantage of local approximations of model dynamics and cost functions. Stability of the policy update is a major issue for these methods, rendering them hard to apply for highly nonlinear systems. Recent approaches combine classical Stochastic Optimal Control methods with information-theoretic bounds to control the step-size of the policy update and could even be used to train nonlinear deep control policies. These methods bound the relative entropy between the new and the old policy to ensure a stable policy update. However, despite the bound in policy space, the state distributions of two consecutive policie...

Extracted data

We use cookies to provide a better user experience.

Data Protection

State-Regularized Policy Search for Linearized Dynamical Systems

Abstract

Extracted data

State-Regularized Policy Search for Linearized Dynamical Systems

Abstract

Extracted data

Related items

Related items