In this paper we investigate the application of natural gradient descent to Bellman error based reinforcement learning algorithms. This combination is interesting because natural gradient descent is invariant to the parameterization of the value function. This invariance property means that natural gradient descent adapts its update directions to correct for poorly conditioned representations. We present and analyze quadratic and linear time natural temporal difference learning algorithms, and prove that they are covariant. We conclude with experiments which suggest that the natural algorithms can match or outperform their non-natural counterparts using linear function approximation, and drastically improve upon their non-natural counterpar...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially ...
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to...
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and fu...
Value functions derived from Markov decision processes arise as a central component of algorithms as...
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ide...
Gradient descent or its variants are popular in training neural networks. However, in deep Q-learnin...
We present four new reinforcement learning algorithms based on actor–critic, function ap-proximation...
Abstract—A common drawback of standard reinforcement learning algorithms is their inability to scale...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical ...
Reinforcement learning is often done using parameterized function approximators to store value funct...
International audienceWe present four new reinforcement learning algorithms based on actor-critic, f...
Abstract — Temporal difference (TD) learning fam-ily tries to learn a least-squares solution of an a...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially ...
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to...
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and fu...
Value functions derived from Markov decision processes arise as a central component of algorithms as...
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ide...
Gradient descent or its variants are popular in training neural networks. However, in deep Q-learnin...
We present four new reinforcement learning algorithms based on actor–critic, function ap-proximation...
Abstract—A common drawback of standard reinforcement learning algorithms is their inability to scale...
International audienceA common drawback of standard reinforcement learning algorithms is their inabi...
We analyse natural gradient learning in a two-layer feed-forward neural network using a statistical ...
Reinforcement learning is often done using parameterized function approximators to store value funct...
International audienceWe present four new reinforcement learning algorithms based on actor-critic, f...
Abstract — Temporal difference (TD) learning fam-ily tries to learn a least-squares solution of an a...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
Many interesting problems in reinforcement learning (RL) are continuous and/or high dimensional, and...
Solving a reinforcement learning (RL) problem poses two competing challenges: fitting a potentially ...
A number of reinforcement learning algorithms have been developed that are guaranteed to converge to...