We consider the use of two additive control variate methods to reduce the variance of performance gradient estimates in reinforcement learning problems. The first approach we consider is the baseline method, in which a function of the current state is added to the discounted value estimate. We relate the performance of these methods, which use sample paths, to the variance of estimates based on iid data. We derive the baseline function that minimizes this variance, and we show that the variance for any baseline is the sum of the optimal variance and a weighted squared distance to the optimal baseline. We show that the widely used average discounted value baseline (where the reward is replaced by the difference between the reward and its exp...
Learning strategies for imperfect information games from samples of interaction is a challenging pro...
Policy gradient (PG) reinforcement learning algorithms have strong (local) con-vergence guarantees, ...
Policy gradient (PG) reinforcement learning algorithms have strong (local) con-vergence guarantees, ...
Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in pract...
There exist a number of reinforcement learning algorithms which learn by climbing the gradient of ex...
Many problems involve the use of models which learn probability distributions or incorporate randomn...
In a reinforcement learning task an agent must learn a policy for performing actions so as to perfo...
Policy gradient methods are a widely used class of model-free reinforcement learning algorithms wher...
Policy gradient methods are a widely used class of model-free reinforcement learning algorithms wher...
© 6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings. ...
Reinforcement learning algorithms are typically geared towards optimizing the expected return of an ...
In many sequential decision-making problems we may want to manage risk by minimizing some measure of...
Policy gradient algorithms are widely used in reinforcement learning problems with con-tinuous actio...
For continuing environments, reinforcement learning (RL) methods commonly maximize the discounted re...
Abstract. We present a general method for maintaining estimates of the distribution of parameters in...
Learning strategies for imperfect information games from samples of interaction is a challenging pro...
Policy gradient (PG) reinforcement learning algorithms have strong (local) con-vergence guarantees, ...
Policy gradient (PG) reinforcement learning algorithms have strong (local) con-vergence guarantees, ...
Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in pract...
There exist a number of reinforcement learning algorithms which learn by climbing the gradient of ex...
Many problems involve the use of models which learn probability distributions or incorporate randomn...
In a reinforcement learning task an agent must learn a policy for performing actions so as to perfo...
Policy gradient methods are a widely used class of model-free reinforcement learning algorithms wher...
Policy gradient methods are a widely used class of model-free reinforcement learning algorithms wher...
© 6th International Conference on Learning Representations, ICLR 2018 - Workshop Track Proceedings. ...
Reinforcement learning algorithms are typically geared towards optimizing the expected return of an ...
In many sequential decision-making problems we may want to manage risk by minimizing some measure of...
Policy gradient algorithms are widely used in reinforcement learning problems with con-tinuous actio...
For continuing environments, reinforcement learning (RL) methods commonly maximize the discounted re...
Abstract. We present a general method for maintaining estimates of the distribution of parameters in...
Learning strategies for imperfect information games from samples of interaction is a challenging pro...
Policy gradient (PG) reinforcement learning algorithms have strong (local) con-vergence guarantees, ...
Policy gradient (PG) reinforcement learning algorithms have strong (local) con-vergence guarantees, ...