By enabling correct differentiation in Stochastic Computation Graphs (SCGs), the infinitely differentiable Monte-Carlo estimator (DiCE) can generate correct estimates for the higher order gradients that arise in, e.g., multi-agent reinforcement learning and meta-learning. However, the baseline term in DiCE that serves as a control variate for reducing variance applies only to first order gradient estimation, limiting the utility of higher-order gradient estimates. To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation. This term may be easily included in the objective, and produces unbiased variance-reduced estimators under (automatic) differentiation, without affecting the estimate of ...
Gradient estimation is often necessary for fitting generative models with discrete latent variables,...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
We consider the problem of efficiently estimating gradients from stochastic simulation. Although the...
By enabling correct differentiation in stochastic computation graphs (SCGs), the infinitely differe...
The score function estimator is widely used for estimating gradients of stochastic objectives in sto...
The score function estimator is widely used for estimating gradients of stochastic objectives in sto...
Modelers use automatic differentiation (AD) of computation graphs to implement complex deep learning...
Optimizing via stochastic gradients is a powerful and exible technique ubiquitously used in machine ...
Discrete expectations arise in various machine learning tasks, and we often need to backpropagate th...
237 pagesIt seems that in the current age, computers, computation, and data have an increasingly imp...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by f...
The reparameterization trick enables optimizing large scale stochastic computation graphs via gradie...
Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparame...
We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for q...
Gradient estimation is often necessary for fitting generative models with discrete latent variables,...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
We consider the problem of efficiently estimating gradients from stochastic simulation. Although the...
By enabling correct differentiation in stochastic computation graphs (SCGs), the infinitely differe...
The score function estimator is widely used for estimating gradients of stochastic objectives in sto...
The score function estimator is widely used for estimating gradients of stochastic objectives in sto...
Modelers use automatic differentiation (AD) of computation graphs to implement complex deep learning...
Optimizing via stochastic gradients is a powerful and exible technique ubiquitously used in machine ...
Discrete expectations arise in various machine learning tasks, and we often need to backpropagate th...
237 pagesIt seems that in the current age, computers, computation, and data have an increasingly imp...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by f...
The reparameterization trick enables optimizing large scale stochastic computation graphs via gradie...
Optimization with noisy gradients has become ubiquitous in statistics and machine learning. Reparame...
We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for q...
Gradient estimation is often necessary for fitting generative models with discrete latent variables,...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
We consider the problem of efficiently estimating gradients from stochastic simulation. Although the...