Many real-world problems are inherently hi- erarchically structured. The use of this struc- ture in an agent's policy may well be the key to improved scalability and higher per- formance. However, such hierarchical struc- tures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy - the 'mixed option' policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy deter- mines the action. In this paper, we reformulate learning a hi- erarchical policy as a latent variable estima- tion problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can lear...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that ...
Learning an optimal policy from a multi-modal reward function is a challenging problem in reinforcem...
Many real-world problems are inherently hi- erarchically structured. The use of this struc- ture in ...
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an...
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an...
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an...
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an...
{Many real hierarchically structured. The use of this structure in an agent's policy may well be the...
Many real-world problems are inherently hierarchically structured. The use of this structure in an...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that ...
Learning an optimal policy from a multi-modal reward function is a challenging problem in reinforcem...
Many real-world problems are inherently hi- erarchically structured. The use of this struc- ture in ...
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an...
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an...
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an...
Many real-world problems are inherently hi-erarchically structured. The use of this struc-ture in an...
{Many real hierarchically structured. The use of this structure in an agent's policy may well be the...
Many real-world problems are inherently hierarchically structured. The use of this structure in an...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Policy search is a successful approach to reinforcement learning. However, policy improvements often...
Many reinforcement learning (RL) tasks, especially in robotics, consist of multiple sub-tasks that ...
Learning an optimal policy from a multi-modal reward function is a challenging problem in reinforcem...