Solving a semi-Markov decision process (SMDP) using value or policy iteration requires precise knowledge of the probabilistic model and suffers from the curse of dimensionality. To overcome these limitations, we present a reinforcement learning approach where one optimizes the SMDP performance criterion with respect to a family of parameterised policies. We propose an online algorithm that simultaneously estimates the gradient of the performance criterion and optimises it using stochastic approximation. We apply our algorithm to call admission control. Our proposed policy gradient SMDP algorithm and its application to admission control is novel. © 2006 Elsevier B.V. All rights reserved
AbstractWe model reinforcement learning as the problem of learning to control a partially observable...
Abstract. We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving a...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
A large class of problems of sequential decision making under uncertainty, of which the underlying p...
We consider the learning problem under an online Markov decision process (MDP), which is aimed at le...
Abstract. We consider the learning problem under an online Markov decision process (MDP), which is a...
Policy-gradient algorithms are attractive as a scalable approach to learning approximate policies fo...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
Abstract. We consider a discrete time, ®nite state Markov reward process that depends on a set of pa...
We present a model-free reinforcement learning method for partially observable Markov decision probl...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision P...
The Markov decision process (MDP) formulation used to model many real-world sequential decision maki...
Abstract. We present a model-free reinforcement learning method for partially observable Markov deci...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method ...
AbstractWe model reinforcement learning as the problem of learning to control a partially observable...
Abstract. We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving a...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
A large class of problems of sequential decision making under uncertainty, of which the underlying p...
We consider the learning problem under an online Markov decision process (MDP), which is aimed at le...
Abstract. We consider the learning problem under an online Markov decision process (MDP), which is a...
Policy-gradient algorithms are attractive as a scalable approach to learning approximate policies fo...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
Abstract. We consider a discrete time, ®nite state Markov reward process that depends on a set of pa...
We present a model-free reinforcement learning method for partially observable Markov decision probl...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision P...
The Markov decision process (MDP) formulation used to model many real-world sequential decision maki...
Abstract. We present a model-free reinforcement learning method for partially observable Markov deci...
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method ...
AbstractWe model reinforcement learning as the problem of learning to control a partially observable...
Abstract. We present a Reinforcement Learning (RL) algorithm based on policy iteration for solving a...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...