We propose a new approach to the problem of searching a space of stochastic controllers for a Markov decision process (MDP) or a partially observable Markov decision process (POMDP). Following several other authors, our approach is based on searching in parameterized families of policies (for example, via gradient descent) to optimize solution quality. However, rather than trying to estimate the values and derivatives of a policy directly, we do so indirectly using estimates for the probability densities that the policy induces on states at the different points in time. This enables our algorithms to exploit the many techniques for efficient and robust approximate density propagation in stochastic systems. We show how our techniques can be ...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
This thesis considers the question of how to most effectively conduct experiments in Partially Obser...
We propose a new method for learning policies for large, partially observable Markov decision proces...
The search for finite-state controllers for partially observable Markov decision processes (POMDPs) ...
AbstractIn this paper, we bring techniques from operations research to bear on the problem of choosi...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Obse...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
This paper is about planning in stochastic domains by means of partially observable Markov decision...
The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligenc...
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework t...
Policy-gradient algorithms are attractive as a scalable approach to learning approximate policies fo...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
This thesis considers the question of how to most effectively conduct experiments in Partially Obser...
We propose a new method for learning policies for large, partially observable Markov decision proces...
The search for finite-state controllers for partially observable Markov decision processes (POMDPs) ...
AbstractIn this paper, we bring techniques from operations research to bear on the problem of choosi...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Obse...
Gradient-based approaches to direct policy search in reinforcement learning have received much rece...
Partially Observable Markov Decision Processes (POMDPs) provide a rich representation for agents act...
Gradient-based approaches to direct policy search in reinforcement learning have received much recen...
This paper is about planning in stochastic domains by means of partially observable Markov decision...
The problem of making optimal decisions in uncertain conditions is central to Artificial Intelligenc...
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework t...
Policy-gradient algorithms are attractive as a scalable approach to learning approximate policies fo...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
This thesis considers the question of how to most effectively conduct experiments in Partially Obser...
We propose a new method for learning policies for large, partially observable Markov decision proces...