International audienceMuch of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms, with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
Off-policy model-free deep reinforcement learning methods using previously collected data can improv...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
International audienceMuch of the recent success of deep reinforcement learning has been driven by r...
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functio...
Multi-task deep reinforcement learning (DRL) ambitiously aims to train a general agent that masters ...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy it...
We introduce a new constrained optimization method for policy gradient reinforcement learning, which...
Reinforcement Learning aims to train autonomous agents in their interaction with the environment by ...
An increasing number of complex problems have naturally posed significant challenges in decision-mak...
Trust region policy optimization (TRPO) is a popular and empirically successful policy search algori...
Reinforcement learning (RL) focuses on an essential aspect of intelligent behavior – how an agent ca...
Graduation date: 2005Reinforcement learning (RL) is the study of systems that learn from interaction...
Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this pape...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
Off-policy model-free deep reinforcement learning methods using previously collected data can improv...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
International audienceMuch of the recent success of deep reinforcement learning has been driven by r...
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functio...
Multi-task deep reinforcement learning (DRL) ambitiously aims to train a general agent that masters ...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy it...
We introduce a new constrained optimization method for policy gradient reinforcement learning, which...
Reinforcement Learning aims to train autonomous agents in their interaction with the environment by ...
An increasing number of complex problems have naturally posed significant challenges in decision-mak...
Trust region policy optimization (TRPO) is a popular and empirically successful policy search algori...
Reinforcement learning (RL) focuses on an essential aspect of intelligent behavior – how an agent ca...
Graduation date: 2005Reinforcement learning (RL) is the study of systems that learn from interaction...
Policy Optimization (PO) is a widely used approach to address continuous control tasks. In this pape...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
Off-policy model-free deep reinforcement learning methods using previously collected data can improv...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...