Learning to predict by the methods of temporal differences

Richard S. Sutton

Publication date

January 1988

Abstract

This article introduces a class of incremental learning procedures spe-cialized for prediction that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual out-comes, tile new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference method ~ have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuris-tic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Learning to predict by the methods of temporal differences

Abstract

Extracted data

Learning to predict by the methods of temporal differences

Abstract

Extracted data

Related items

Related items