This thesis studies policy iteration methods with linear approximation of the value function for large state space problems in the reinforcement learning context. We first introduce a unified algorithm that generalizes the main stochastic optimal control methods. We show the convergence of this unified algorithm to the optimal value function in the tabular case, and a performance bound in the approximate case when the value function is estimated. We then extend the literature of second-order linear approximation algorithms by proposing a generalization of Least-Squares Policy Iteration (LSPI) (Lagoudakis and Parr, 2003). Our new algorithm, Least-Squares λ Policy Iteration (LSλPI), adds to LSPI an idea of λ-Policy Iteration (Bertsekas and Io...
Summarization: Least-squares methods have been successfully used for prediction problems in the cont...
In this paper, we report a performance bound for the widely used least-squares policy iter-ation (LS...
International audienceModified policy iteration (MPI) is a dynamic programming (DP) algorithm that c...
Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforc...
Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforc...
Approximate reinforcement learning deals with the essential problem of applying reinforcement learni...
Abstract—In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm f...
International audienceTetris is a video game that has been widely used as a benchmark for various op...
Summarization: We propose a new approach to reinforcement learning which combines least squares func...
Reinforcement learning describes how an agent can learn to act in an unknown environment in order to...
International audienceWe consider the discrete-time infinite-horizon optimal control problem formali...
Δημοσίευση σε επιστημονικό περιοδικόSummarization: We propose a new approach to reinforcement learni...
We consider the discrete-time infinite-horizon optimal control problem formalized by Markov de-cisio...
Abstract—Policy iteration is the core procedure for solving problems of reinforcement learning metho...
peer reviewedReinforcement learning is a promising paradigm for learning optimal control. We conside...
Summarization: Least-squares methods have been successfully used for prediction problems in the cont...
In this paper, we report a performance bound for the widely used least-squares policy iter-ation (LS...
International audienceModified policy iteration (MPI) is a dynamic programming (DP) algorithm that c...
Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforc...
Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforc...
Approximate reinforcement learning deals with the essential problem of applying reinforcement learni...
Abstract—In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm f...
International audienceTetris is a video game that has been widely used as a benchmark for various op...
Summarization: We propose a new approach to reinforcement learning which combines least squares func...
Reinforcement learning describes how an agent can learn to act in an unknown environment in order to...
International audienceWe consider the discrete-time infinite-horizon optimal control problem formali...
Δημοσίευση σε επιστημονικό περιοδικόSummarization: We propose a new approach to reinforcement learni...
We consider the discrete-time infinite-horizon optimal control problem formalized by Markov de-cisio...
Abstract—Policy iteration is the core procedure for solving problems of reinforcement learning metho...
peer reviewedReinforcement learning is a promising paradigm for learning optimal control. We conside...
Summarization: Least-squares methods have been successfully used for prediction problems in the cont...
In this paper, we report a performance bound for the widely used least-squares policy iter-ation (LS...
International audienceModified policy iteration (MPI) is a dynamic programming (DP) algorithm that c...