A key open problem in reinforcement learning is to assure convergence when using a compact hy-pothesis class to approximate the value function. Although the standard temporal-difference learning algorithm has been shown to converge when the hy-pothesis class is a linear combination of fixed ba-sis functions, it may diverge with a general (non-linear) hypothesis class. This paper describes the Bridge algorithm, a new method for reinforcement learning, and shows that it converges to an approxi-mate global optimum for any agnostically learnable hypothesis class. Convergence is demonstrated on a simple example for which temporal-difference learning fails. Weak conditions are identified un-der which the Bridge algorithm converges for any hypothe...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
Learning algorithms for feedforward connectionist systems in a reinforcement learning environment ar...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...
Reinforcement learning algorithms comprise a class of learning algorithms for neural networks. Reinf...
We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a n...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
Abstract — A theoretical analysis of Model-Based Temporal Difference Learning for Control is given, ...
Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are...
Reinforcement learning is defined as the problem of an agent that learns to perform a certain task t...
In this paper, we propose conditions under which Q iteration using decision trees for function app...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
A theoretical analysis of Model-Based Temporal Difference Learning for Control is given, leading to...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide ...
Learning algorithms for feedforward connectionist systems in a reinforcement learning environment ar...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
Learning algorithms for feedforward connectionist systems in a reinforcement learning environment ar...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...
Reinforcement learning algorithms comprise a class of learning algorithms for neural networks. Reinf...
We explore fixed-horizon temporal difference (TD) methods, reinforcement learning algorithms for a n...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
Abstract — A theoretical analysis of Model-Based Temporal Difference Learning for Control is given, ...
Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are...
Reinforcement learning is defined as the problem of an agent that learns to perform a certain task t...
In this paper, we propose conditions under which Q iteration using decision trees for function app...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
A theoretical analysis of Model-Based Temporal Difference Learning for Control is given, leading to...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide ...
Learning algorithms for feedforward connectionist systems in a reinforcement learning environment ar...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
Learning algorithms for feedforward connectionist systems in a reinforcement learning environment ar...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...