Policy gradient methods are among the most effective methods for large-scale reinforcement learning, and their empirical success has prompted several works that develop the foundation of their global convergence theory. However, prior works have either required exact gradients or state-action visitation measure based mini-batch stochastic gradients with a diverging batch size, which limit their applicability in practical scenarios. In this paper, we consider classical policy gradient methods that compute an approximate gradient with a single trajectory or a fixed size mini-batch of trajectories under soft-max parametrization and log-barrier regularization, along with the widely-used REINFORCE gradient estimation procedure. By controlling th...
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some mea...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
peer reviewedIn this paper, we propose an extension to the policy gradient algorithms by allowing st...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
International audiencePolicy gradient methods are among the best Reinforcement Learning (RL) techniq...
Policy gradient methods have been frequently applied to problems in control and reinforcement learni...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing pa...
In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Disting...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
Policy gradient (PG) reinforcement learning algorithms have strong (local) con-vergence guarantees, ...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some mea...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
peer reviewedIn this paper, we propose an extension to the policy gradient algorithms by allowing st...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
International audiencePolicy gradient methods are among the best Reinforcement Learning (RL) techniq...
Policy gradient methods have been frequently applied to problems in control and reinforcement learni...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing pa...
In this paper, we consider an infinite horizon average reward Markov Decision Process (MDP). Disting...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
Policy gradient (PG) reinforcement learning algorithms have strong (local) con-vergence guarantees, ...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Obtaining first-order regret bounds -- regret bounds scaling not as the worst-case but with some mea...
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and determ...
peer reviewedIn this paper, we propose an extension to the policy gradient algorithms by allowing st...