Approximating Optimal Control with Value Gradient Learning

Fairbank, M.
Prokhorov, D.
Alonso, E.

Open PDF

Open link

Publication date

February 2013

DOI

10.1002/9781118453988

Publisher

Wiley

Citation count (estimate)

Abstract

In this chapter, we extend the ADP algorithm, dual heuristic programming (DHP), to include a “bootstrapping” parameter λ, analogous to that used in the reinforcement learning algorithm TD(λ). The resulting algorithm, which we call VGL(λ) for value-gradient learning, is proven to produce a weight update that can be equivalent to backpropagation through time (BPTT) applied to a greedy policy on a critic function. This provides a surprising connection between the two alternate methods of BPTT and DHP. Under certain smoothness conditions, VGL(λ=1) with a greedy policy acquires the strong convergence conditions of BPTT, while using a general function approximator for the critic. We show that this can lead to increased stability in the learning o...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Approximating Optimal Control with Value Gradient Learning

Abstract

Extracted data

Approximating Optimal Control with Value Gradient Learning

Abstract

Extracted data

Related items

Related items