In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning o...
Reinforcement learning is often done using parameterized function approximators to store value funct...
International audienceReinforcement learning (RL) is generally considered as the machine learning an...
Gradient-based methods have been widely used for system design and optimization in diverse applicati...
We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), whic...
We describe an Adaptive Dynamic Programming algorithm VGL(λ) for learning a critic function over a l...
In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP)...
Adaptive critic methods for reinforcement learning are known to provide consistent solutions to opti...
Whether animals behave optimally is an open question of great importance, both theoretically and in ...
A number of success stories have been told where reinforcement learning has been applied to problems...
The concept of value templates and perceptual learning are introduced as refinements to the reinforc...
Approximate dynamic programming approaches to the reinforcement learning problem are often categoriz...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide ...
Reinforcement learning, mathematically described by Markov Decision Problems, may be approached eith...
In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an a...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Reinforcement learning is often done using parameterized function approximators to store value funct...
International audienceReinforcement learning (RL) is generally considered as the machine learning an...
Gradient-based methods have been widely used for system design and optimization in diverse applicati...
We consider the adaptive dynamic programming technique called Dual Heuristic Programming (DHP), whic...
We describe an Adaptive Dynamic Programming algorithm VGL(λ) for learning a critic function over a l...
In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP)...
Adaptive critic methods for reinforcement learning are known to provide consistent solutions to opti...
Whether animals behave optimally is an open question of great importance, both theoretically and in ...
A number of success stories have been told where reinforcement learning has been applied to problems...
The concept of value templates and perceptual learning are introduced as refinements to the reinforc...
Approximate dynamic programming approaches to the reinforcement learning problem are often categoriz...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide ...
Reinforcement learning, mathematically described by Markov Decision Problems, may be approached eith...
In adaptive dynamic programming, neurocontrol, and reinforcement learning, the objective is for an a...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Reinforcement learning is often done using parameterized function approximators to store value funct...
International audienceReinforcement learning (RL) is generally considered as the machine learning an...
Gradient-based methods have been widely used for system design and optimization in diverse applicati...