Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of feedback-controllers by taking advantage of local approximations of model dynamics and cost functions. Stability of the policy update is a major issue for these methods, rendering them hard to apply for highly nonlinear systems. Recent approaches combine classical Stochastic Optimal Control methods with information-theoretic bounds to control the step-size of the policy update and could even be used to train nonlinear deep control policies. These methods bound the relative entropy between the new and the old policy to ensure a stable policy update. However, despite the bound in policy space, the state distributions of two consecutive policie...
Reinforcement learning (RL) is an important field of research in machine learning that is increasing...
In this thesis, we study the related problems of reinforcement learning and optimal adaptive control...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...
proaches rely on samples to either obtain an estimate of the value function or a linearisation of th...
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the d...
In this paper, we consider the control problem in a reinforcement learning setting with large state ...
Stochastic Optimal Control (SOC) is typically used to plan a movement for a specific situation. Whil...
This paper presents a new problem solving approach that is able to generate optimal policy solution ...
In this article, we present a generalized view on Path Integral Control (PIC) methods. PIC refers to...
The framework of dynamic programming (DP) and reinforcement learning (RL) can be used to express imp...
Many Stochastic Optimal Control (SOC) approaches rely on samples to either obtain an estimate of th...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Many of the recent trajectory optimization algorithms alternate between linear approximation of the ...
Abstract — For controlling high-dimensional robots, most stochastic optimal control algorithms use a...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Reinforcement learning (RL) is an important field of research in machine learning that is increasing...
In this thesis, we study the related problems of reinforcement learning and optimal adaptive control...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...
proaches rely on samples to either obtain an estimate of the value function or a linearisation of th...
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the d...
In this paper, we consider the control problem in a reinforcement learning setting with large state ...
Stochastic Optimal Control (SOC) is typically used to plan a movement for a specific situation. Whil...
This paper presents a new problem solving approach that is able to generate optimal policy solution ...
In this article, we present a generalized view on Path Integral Control (PIC) methods. PIC refers to...
The framework of dynamic programming (DP) and reinforcement learning (RL) can be used to express imp...
Many Stochastic Optimal Control (SOC) approaches rely on samples to either obtain an estimate of th...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Many of the recent trajectory optimization algorithms alternate between linear approximation of the ...
Abstract — For controlling high-dimensional robots, most stochastic optimal control algorithms use a...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Reinforcement learning (RL) is an important field of research in machine learning that is increasing...
In this thesis, we study the related problems of reinforcement learning and optimal adaptive control...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...