Abstract. Although tabular reinforcement learning (RL) methods have been proved to converge to an optimal policy, the combination of particular conventional reinforcement learning techniques with function approximators can lead to divergence. In this paper we show why off-policy RL methods combined with linear function approximators can lead to divergence. Furthermore, we analyze two different types of updates; standard and averaging RL updates. Although averaging RL will not diverge, we show that they can converge to wrong value functions. In our experiments we compare standard to averaging value iteration (VI) with CMACs and the results show that for small values of the discount factor averaging VI works better, whereas for large values o...
In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pi...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
A key open problem in reinforcement learning is to assure convergence when using a compact hy-pothes...
Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide ...
Since their introduction a year ago, distributional approaches to reinforcement learning (distributi...
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to...
The application of reinforcement learning to problems with continuous domains requires representing ...
There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones tha...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
The approximation inaccuracy of the value function in reinforcement learning (RL) algorithms unavoid...
Reinforcement learning is often done using parameterized function approximators to store value funct...
On large problems, reinforcement learning systems must use parameterized function approximators such...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
richOcs.umass.edu On large problems, reinforcement learning systems must use parame-terized function...
In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pi...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
A key open problem in reinforcement learning is to assure convergence when using a compact hy-pothes...
Many algorithms for approximate reinforcement learning are not known to converge. In fact, there are...
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide ...
Since their introduction a year ago, distributional approaches to reinforcement learning (distributi...
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to...
The application of reinforcement learning to problems with continuous domains requires representing ...
There are two classes of average reward reinforcement learning (RL) algorithms: model-based ones tha...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
The approximation inaccuracy of the value function in reinforcement learning (RL) algorithms unavoid...
Reinforcement learning is often done using parameterized function approximators to store value funct...
On large problems, reinforcement learning systems must use parameterized function approximators such...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
richOcs.umass.edu On large problems, reinforcement learning systems must use parame-terized function...
In this paper we revisit the method of off-policy corrections for reinforcement learning (COP-TD) pi...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
A key open problem in reinforcement learning is to assure convergence when using a compact hy-pothes...