Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more computationally efficient than non-incremental least-squares TD methods such as LSTD (Bradtke & Barto 1996; Boyan 1999). In particular, we show that the per-time-step complexities of iLSTD and TD(0) are O(n), where n is the number of features, whereas that of LSTD is O(n 2). This difference can be decisive in modern applications of reinforcement learn...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
Abstract—A common drawback of standard reinforcement learning algorithms is their inability to scale...
Approximate policy evaluation with linear function approx-imation is a commonly arising problem in r...
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works b...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works b...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in r...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
Abstract—A common drawback of standard reinforcement learning algorithms is their inability to scale...
Approximate policy evaluation with linear function approx-imation is a commonly arising problem in r...
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works b...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
TD() is a popular family of algorithms for approximate policy evaluation in large MDPs. TD() works b...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We introduce two new temporal difference (TD) algorithms based on the theory of linear leastsquares ...
We present new theoretical and empirical results with the iLSTD algorithm for policy evaluation in r...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
Abstract—A common drawback of standard reinforcement learning algorithms is their inability to scale...