At the working heart of policy iteration algorithms commonly used and studied in the discounted setting of reinforcement learning, the policy evaluation step estimates the value of states with samples from a Markov reward process induced by following a Markov policy in a Markov decision process. We propose a simple and efficient estimator called loop estimator that exploits the regenerative structure of Markov reward processes without explicitly estimating a full model. Our method enjoys a space complexity of O(1) when estimating the value of a single positive recurrent state s unlike TD with O(S) or model-based methods with O(S^2). Moreover, the regenerative structure enables us to show, without relying on the generative model approach, th...
We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Le...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows li...
AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Marko...
Abstract. We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving a...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
We address performance issues associated with simulationbased algorithms for optimizing Markov rewar...
Abstract. We consider a discrete time, ®nite state Markov reward process that depends on a set of pa...
Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)We consid...
We consider the problem of finding an optimal policy in a Markov decision process that maximises the...
We consider Howard's policy iteration algorithm for multichained finite state and action Markov deci...
The most relevant problems in discounted reinforcement learning involve estimating the mean of a fun...
International audienceTo tackle the potentially hard task of defining the reward function in a Marko...
We study the problem of achieving a given value in Markov decision processes (MDPs) with several ind...
For semi-Markov decision processes with discounted rewards we derive the well known results regardin...
We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Le...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows li...
AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Marko...
Abstract. We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving a...
Abstract We consider the problem of learning the optimal action-value func-tion in discounted-reward...
We address performance issues associated with simulationbased algorithms for optimizing Markov rewar...
Abstract. We consider a discrete time, ®nite state Markov reward process that depends on a set of pa...
Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)We consid...
We consider the problem of finding an optimal policy in a Markov decision process that maximises the...
We consider Howard's policy iteration algorithm for multichained finite state and action Markov deci...
The most relevant problems in discounted reinforcement learning involve estimating the mean of a fun...
International audienceTo tackle the potentially hard task of defining the reward function in a Marko...
We study the problem of achieving a given value in Markov decision processes (MDPs) with several ind...
For semi-Markov decision processes with discounted rewards we derive the well known results regardin...
We present the novel Kernel Rewards Regression (KRR) method for Policy Iteration in Reinforcement Le...
We consider a learning problem where the decision maker interacts with a standard Markov decision pr...
The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows li...