We consider the problem of finding the best features for value function approximation in reinforcement learning and develop an online algorithm to optimize the mean square Bellman error objective. For any given feature value, our algorithm performs gradient search in the parameter space via a residual gradient scheme and, on a slower timescale, also performs gradient search in the Grassman manifold of features. We present a proof of convergence of our algorithm. We show empirical results using our algorithm as well as a similar algorithm that uses temporal difference learning in place of the residual gradient scheme for the faster timescale updates
AbstractThis work presents the restricted gradient-descent (RGD) algorithm, a training method for lo...
We introduce and empirically evaluate two novel online gradient-based reinforcement learning algorit...
Reinforcement learning is often done using parameterized function approximators to store value funct...
We consider the problem of finding the best features for value function approximation in reinforceme...
This paper addresses the problem of automatic generation of features for value function approximatio...
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to...
We establish connections from optimizing Bellman Residual and Temporal Difference Loss to worstcase ...
Reinforcement learning deals with the problem of sequential decision making in uncertain stochastic ...
A common solution approach to reinforcement learning problems with large state spaces (where value f...
We establish a connection between optimizing the Bellman Residual and worst case long-term predictiv...
In reinforcement learning it is frequently necessary to resort to an approximation to the true optim...
This paper explores a new framework for reinforcement learning based on online convex optimization, ...
Abstract—Most successful examples of Reinforcement Learning (RL) report the use of carefully designe...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide ...
Graduation date: 2007The thesis focuses on model-based approximation methods for reinforcement\ud le...
AbstractThis work presents the restricted gradient-descent (RGD) algorithm, a training method for lo...
We introduce and empirically evaluate two novel online gradient-based reinforcement learning algorit...
Reinforcement learning is often done using parameterized function approximators to store value funct...
We consider the problem of finding the best features for value function approximation in reinforceme...
This paper addresses the problem of automatic generation of features for value function approximatio...
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to...
We establish connections from optimizing Bellman Residual and Temporal Difference Loss to worstcase ...
Reinforcement learning deals with the problem of sequential decision making in uncertain stochastic ...
A common solution approach to reinforcement learning problems with large state spaces (where value f...
We establish a connection between optimizing the Bellman Residual and worst case long-term predictiv...
In reinforcement learning it is frequently necessary to resort to an approximation to the true optim...
This paper explores a new framework for reinforcement learning based on online convex optimization, ...
Abstract—Most successful examples of Reinforcement Learning (RL) report the use of carefully designe...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide ...
Graduation date: 2007The thesis focuses on model-based approximation methods for reinforcement\ud le...
AbstractThis work presents the restricted gradient-descent (RGD) algorithm, a training method for lo...
We introduce and empirically evaluate two novel online gradient-based reinforcement learning algorit...
Reinforcement learning is often done using parameterized function approximators to store value funct...