International audiencePolicy gradient methods are among the best Reinforcement Learning (RL) techniques to solve complex control problems. In real-world RL applications, it is common to have a good initial policy whose performance needs to be improved and it may not be acceptable to try bad policies during the learning process. Although several methods for choosing the step size exist, research paid less attention to determine the batch size, that is the number of samples used to estimate the gradient direction for each update of the policy parameters. In this paper, we propose a set of methods to jointly optimize the step and the batch sizes that guarantee (with high probability) to improve the policy performance after each update. Besides...
While reinforcement learning (RL) algorithms have been successfully applied to a wide range of probl...
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement le...
Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper propo...
International audiencePolicy gradient methods are among the best Reinforcement Learning (RL) techniq...
Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
peer reviewedIn this paper, we propose an extension to the policy gradient algorithms by allowing st...
We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated ...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
Policy gradient methods are among the most effective methods for large-scale reinforcement learning,...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processe...
International audienceIn reinforcement learning, an agent collects information interacting with an e...
International audienceIn reinforcement learning, policy gradient algorithms optimize the policy dire...
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized...
While reinforcement learning (RL) algorithms have been successfully applied to a wide range of probl...
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement le...
Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper propo...
International audiencePolicy gradient methods are among the best Reinforcement Learning (RL) techniq...
Policy Gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
peer reviewedIn this paper, we propose an extension to the policy gradient algorithms by allowing st...
We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated ...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
Policy gradient methods are among the most effective methods for large-scale reinforcement learning,...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processe...
International audienceIn reinforcement learning, an agent collects information interacting with an e...
International audienceIn reinforcement learning, policy gradient algorithms optimize the policy dire...
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized...
While reinforcement learning (RL) algorithms have been successfully applied to a wide range of probl...
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement le...
Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper propo...