Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforcement à grand espace d'états avec approximation linéaire de la fonction de valeur. Nous proposons d'abord une unification des principaux algorithmes du contrôle optimal stochastique. Nous montrons la convergence de cette version unifiée vers la fonction de valeur optimale dans le cas tabulaire, ainsi qu'une garantie de performances dans le cas où la fonction de valeur est estimée de façon approximative. Nous étendons ensuite l'état de l'art des algorithmes d'approximation linéaire du second ordre en proposant une généralisation de Least-Squares Policy Iteration (LSPI) (Lagoudakis et Parr, 2003). Notre nouvel algorithme, Least-Squares [lambda]...
In this paper, we report a performance bound for the widely used least-squares policy iteration (LSP...
Abstract—Policy iteration is the core procedure for solving problems of reinforcement learning metho...
Approximate policy iteration methods based on temporal differences are popular in practice, and hav...
Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforc...
Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforc...
This thesis studies policy iteration methods with linear approximation of the value function for lar...
International audienceWe consider the discrete-time infinite-horizon optimal control problem formali...
We consider the discrete-time infinite-horizon optimal control problem formalized by Markov de-cisio...
International audienceTetris is a video game that has been widely used as a benchmark for various op...
International audienceModified policy iteration (MPI) is a dynamic programming (DP) algorithm that c...
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebr...
Summarization: We propose a new approach to reinforcement learning which combines least squares func...
Approximate reinforcement learning deals with the essential problem of applying reinforcement learni...
With the rapid advent of video games recently and the increasing numbers of players and gamers, only...
In this paper, we report a performance bound for the widely used least-squares policy iter-ation (LS...
In this paper, we report a performance bound for the widely used least-squares policy iteration (LSP...
Abstract—Policy iteration is the core procedure for solving problems of reinforcement learning metho...
Approximate policy iteration methods based on temporal differences are popular in practice, and hav...
Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforc...
Cette thèse s'intéresse aux méthodes d'itération sur les politiques dans l'apprentissage par renforc...
This thesis studies policy iteration methods with linear approximation of the value function for lar...
International audienceWe consider the discrete-time infinite-horizon optimal control problem formali...
We consider the discrete-time infinite-horizon optimal control problem formalized by Markov de-cisio...
International audienceTetris is a video game that has been widely used as a benchmark for various op...
International audienceModified policy iteration (MPI) is a dynamic programming (DP) algorithm that c...
Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebr...
Summarization: We propose a new approach to reinforcement learning which combines least squares func...
Approximate reinforcement learning deals with the essential problem of applying reinforcement learni...
With the rapid advent of video games recently and the increasing numbers of players and gamers, only...
In this paper, we report a performance bound for the widely used least-squares policy iter-ation (LS...
In this paper, we report a performance bound for the widely used least-squares policy iteration (LSP...
Abstract—Policy iteration is the core procedure for solving problems of reinforcement learning metho...
Approximate policy iteration methods based on temporal differences are popular in practice, and hav...