Embedding an optimization process has been explored for imposing efficient and flexible policy structures. Existing work often build upon nonlinear optimization with explicitly unrolling of iteration steps, making policy inference prohibitively expensive for online learning and real-time control. Our approach embeds a linear-quadratic-regulator (LQR) formulation with a Koopman representation, thus exhibiting the tractability from a closed-form solution and richness from a non-convex neural network. We use a few auxiliary objectives and reparameterization to enforce optimality conditions of the policy that can be easily integrated to standard gradient-based learning. Our approach is shown to be effective for learning policies rendering an op...
In real-world robotic applications, many factors, both at low level (e.g., vision, motion control an...
Learning methods to enable high performance control systems have recently shown promising results in...
This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learn...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Data-driven control of nonlinear dynamical systems is a largely open problem. In this paper, buildin...
In robotics, elementary behaviour patterns often tackle control theoretic problems. Because of incom...
Recently Koopman operator has become a promising data-driven tool to facilitate real-time control fo...
Abstract—Reinforcement learning and policy search methods can in principle solve a wide range of con...
We propose a novel framework for learning linear time-invariant (LTI) models for a class of continuo...
In real-world robotic applications, many factors, both at low-level (e.g., vision and motion control...
We present an Imitation Learning approach for the control of dynamical systems with a known model. ...
Reinforcement learning offers a general framework to explain reward related learning in artificial a...
Applying reinforcement learning to control systems enables the use of machine learning to develop el...
Gradient-based methods have been widely used for system design and optimization in diverse applicati...
In real-world robotic applications, many factors, both at low level (e.g., vision, motion control an...
Learning methods to enable high performance control systems have recently shown promising results in...
This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learn...
The Iterative Linear Quadratic Regulator (ILQR), a variant of Differential Dynamic Programming (DDP)...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Data-driven control of nonlinear dynamical systems is a largely open problem. In this paper, buildin...
In robotics, elementary behaviour patterns often tackle control theoretic problems. Because of incom...
Recently Koopman operator has become a promising data-driven tool to facilitate real-time control fo...
Abstract—Reinforcement learning and policy search methods can in principle solve a wide range of con...
We propose a novel framework for learning linear time-invariant (LTI) models for a class of continuo...
In real-world robotic applications, many factors, both at low-level (e.g., vision and motion control...
We present an Imitation Learning approach for the control of dynamical systems with a known model. ...
Reinforcement learning offers a general framework to explain reward related learning in artificial a...
Applying reinforcement learning to control systems enables the use of machine learning to develop el...
Gradient-based methods have been widely used for system design and optimization in diverse applicati...
In real-world robotic applications, many factors, both at low level (e.g., vision, motion control an...
Learning methods to enable high performance control systems have recently shown promising results in...
This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learn...