ICML 2019International audienceMany recent successful (deep) reinforcement learning algorithms make use of regularization, generally based on entropy or Kullback-Leibler divergence. We propose a general theory of regularized Markov Decision Processes that generalizes these approaches in two directions: we consider a larger class of regularizers, and we consider the general modified policy iteration approach, encompassing both policy iteration and value iteration. The core building blocks of this theory are a notion of regularized Bellman operator and the Legendre-Fenchel transform, a classical tool of convex optimization. This approach allows for error propagation analyses of general algorithmic schemes of which (possibly variants of) class...
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Mar...
Policy regularization methods such as maximum entropy regularization are widely used in reinforcemen...
We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy giv...
ICML 2019International audienceMany recent successful (deep) reinforcement learning algorithms make ...
In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem p...
Abstract-Reinforcement learning with linear and non-linear function approximation has been studied e...
Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision probl...
Reinforcement learning (RL) is an important field of research in machine learning that is increasing...
Abstract — Reinforcement learning with linear and non-linear function approximation has been studied...
An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthes...
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of th...
Monte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making. Th...
Robust Markov decision processes (MDPs) provide a general framework to model decision problems where...
In this paper we consider approximate policy-iteration-based reinforcement learn-ing algorithms. In ...
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functio...
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Mar...
Policy regularization methods such as maximum entropy regularization are widely used in reinforcemen...
We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy giv...
ICML 2019International audienceMany recent successful (deep) reinforcement learning algorithms make ...
In the Bayesian reinforcement learning (RL) setting, a prior distribution over the unknown problem p...
Abstract-Reinforcement learning with linear and non-linear function approximation has been studied e...
Reinforcement Learning (RL) provides a general methodology to solve complex uncertain decision probl...
Reinforcement learning (RL) is an important field of research in machine learning that is increasing...
Abstract — Reinforcement learning with linear and non-linear function approximation has been studied...
An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthes...
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of th...
Monte-Carlo planning and Reinforcement Learning (RL) are essential to sequential decision making. Th...
Robust Markov decision processes (MDPs) provide a general framework to model decision problems where...
In this paper we consider approximate policy-iteration-based reinforcement learn-ing algorithms. In ...
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functio...
We provide an algorithm that achieves the optimal regret rate in an unknown weakly communicating Mar...
Policy regularization methods such as maximum entropy regularization are widely used in reinforcemen...
We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy giv...