Many Stochastic Optimal Control (SOC) approaches rely on samples to either obtain an estimate of the value function or a linearisation of the underlying system model. However, these approaches typically neglect the fact that the accuracy of the policy update depends on the closeness of the resulting trajectory distribution to these samples. The greedy operator does not consider such closeness constraint to the samples. Hence, the greedy operator can lead to oscillations or even instabilities in the policy updates. Such undesired behaviour is likely to result in an inferior performance of the estimated policy. We reuse inspiration from the reinforcement learning community and relax the greedy operator used in SOC with an informat...
Abstract. Reinforcement learning means finding the optimal course of action in Markovian environment...
Decision making under uncertainty is an important problem in engineering that is traditionally appro...
Dynamic programming is a principal method for analyzing stochastic optimal control problems. However...
proaches rely on samples to either obtain an estimate of the value function or a linearisation of th...
Stochastic Optimal Control (SOC) is typically used to plan a movement for a specific situation. Whil...
Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of...
Copyright © 2014 IEEEPresented at IEEE Symposium on Adaptive Dynamic Programming and Reinforcement L...
In this article we study the connection of stochastic optimal control and reinforcement learning. Ou...
peer reviewedWe propose new methods for guiding the generation of informative trajectories when solv...
Abstract — For controlling high-dimensional robots, most stochastic optimal control algorithms use a...
How does uncertainty affect a robot when attempting to generate a control policy to achieve some obj...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2...
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the d...
For controlling high-dimensional robots, most stochastic optimal control algorithms use approximatio...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Abstract. Reinforcement learning means finding the optimal course of action in Markovian environment...
Decision making under uncertainty is an important problem in engineering that is traditionally appro...
Dynamic programming is a principal method for analyzing stochastic optimal control problems. However...
proaches rely on samples to either obtain an estimate of the value function or a linearisation of th...
Stochastic Optimal Control (SOC) is typically used to plan a movement for a specific situation. Whil...
Trajectory-Centric Reinforcement Learning and Trajectory Optimization methods optimize a sequence of...
Copyright © 2014 IEEEPresented at IEEE Symposium on Adaptive Dynamic Programming and Reinforcement L...
In this article we study the connection of stochastic optimal control and reinforcement learning. Ou...
peer reviewedWe propose new methods for guiding the generation of informative trajectories when solv...
Abstract — For controlling high-dimensional robots, most stochastic optimal control algorithms use a...
How does uncertainty affect a robot when attempting to generate a control policy to achieve some obj...
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2...
Many of the recent Trajectory Optimization algorithms alternate between local approximation of the d...
For controlling high-dimensional robots, most stochastic optimal control algorithms use approximatio...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Abstract. Reinforcement learning means finding the optimal course of action in Markovian environment...
Decision making under uncertainty is an important problem in engineering that is traditionally appro...
Dynamic programming is a principal method for analyzing stochastic optimal control problems. However...