Greedy adaptive critics for LPQ [dvs LQR] problems : Convergence Proofs

Landelius, Tomas
Knutsson, Hans

Publication date

January 1996

Publisher

Linköping, Sweden : Linköping University, Department of Electrical Engineering

Abstract

A number of success stories have been told where reinforcement learning has been applied to problems in continuous state spaces using neural nets or other sorts of function approximators in the adaptive critics. However, the theoretical understanding of why and when these algorithms work is inadequate. This is clearly exemplified by the lack of convergence results for a number of important situations. To our knowledge only two such results been presented for systems in the continuous state space domain. The first is due to Werbos and is concerned with linear function approximation and heuristic dynamic programming. Here no optimal strategy can be found why the result is of limited importance. The second result is due to Bradtke and deals wi...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Greedy adaptive critics for LPQ [dvs LQR] problems : Convergence Proofs

Abstract

Extracted data

Greedy adaptive critics for LPQ [dvs LQR] problems : Convergence Proofs

Abstract

Extracted data

Related items

Related items