An Improved N-Step Value Gradient Learning Adaptive Dynamic Programming Algorithm for Online Learning

Al-Dabooni, Seaar
Wunsch, Donald C.

Open link

Publication date

April 2020

DOI

10.1109/TNNLS.2019.2919338

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Abstract

In problems with complex dynamics and challenging state spaces, the dual heuristic programming (DHP) algorithm has been shown theoretically and experimentally to perform well. This was recently extended by an approach called value gradient learning (VGL). VGL was inspired by a version of temporal difference (TD) learning that uses eligibility traces. The eligibility traces create an exponential decay of older observations with a decay parameter (λ). This approach is known as TD(λ), and its DHP extension is known as VGL(λ), where VGL(0) is identical to DHP. VGL has presented convergence and other desirable properties, but it is primarily useful for batch learning. Online learning requires an eligibility-trace-work-space matrix, which is not ...