We study policy gradient for mean-field control in continuous time in a reinforcement learning setting. By considering randomised policies with entropy regularisation, we derive a gradient expectation representation of the value function, which is amenable to actor-critic type algorithms, where the value functions and the policies are learnt alternately based on observation samples of the state and model-free estimation of the population state distribution, either by offline or online learning. In the linear-quadratic mean-field framework, we obtain an exact parametrisation of the actor and critic functions defined on the Wasserstein space. Finally, we illustrate the results of our algorithms with some numerical experiments on concrete...
Motivated by the recent applications of game-theoretical learning techniques to the design of distri...
Reinforcement learning algorithms are typically geared towards optimizing the expected return of an ...
Abstract. In this paper we address reinforcement learning problems with continuous state-action spac...
We study policy gradient for mean-field control in continuous time in a reinforcement learning sett...
Learning in real-world domains often requires to deal with continuous state and action spaces. Alth...
textabstractMany traditional reinforcement-learning algorithms have been designed for problems with ...
Abstract—Policy gradient based actor-critic algorithms are amongst the most popular algorithms in th...
Classical control theory requires a model to be derived for a system, before any control design can ...
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and fu...
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ide...
This paper presents a reinforcement learning framework for continuous-time dynamical systems without...
This paper presents the first actor-critic al-gorithm for off-policy reinforcement learning. Our alg...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method ...
In this manuscript, we develop reinforcement learning theory and algorithms for differential games w...
We study policy gradient (PG) for reinforcement learning in continuous time and space under the regu...
Motivated by the recent applications of game-theoretical learning techniques to the design of distri...
Reinforcement learning algorithms are typically geared towards optimizing the expected return of an ...
Abstract. In this paper we address reinforcement learning problems with continuous state-action spac...
We study policy gradient for mean-field control in continuous time in a reinforcement learning sett...
Learning in real-world domains often requires to deal with continuous state and action spaces. Alth...
textabstractMany traditional reinforcement-learning algorithms have been designed for problems with ...
Abstract—Policy gradient based actor-critic algorithms are amongst the most popular algorithms in th...
Classical control theory requires a model to be derived for a system, before any control design can ...
We present four new reinforcement learning algorithms based on actor-critic, natural-gradient and fu...
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ide...
This paper presents a reinforcement learning framework for continuous-time dynamical systems without...
This paper presents the first actor-critic al-gorithm for off-policy reinforcement learning. Our alg...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method ...
In this manuscript, we develop reinforcement learning theory and algorithms for differential games w...
We study policy gradient (PG) for reinforcement learning in continuous time and space under the regu...
Motivated by the recent applications of game-theoretical learning techniques to the design of distri...
Reinforcement learning algorithms are typically geared towards optimizing the expected return of an ...
Abstract. In this paper we address reinforcement learning problems with continuous state-action spac...