A Markov chain Monte Carlo algorithm for Bayesian policy search

Tavakol Aghaei, Vahid
Onat, Ahmet
Yıldırım, Sinan

Open link

Publication date

October 2018

DOI

10.1080/21642583.2018.1528483

Publisher

Informa UK Limited

Abstract

Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesian approach to policy search under RL paradigm, for the problem of controlling a discrete time Markov decision process with continuous state and action spaces and with a multiplicative reward structure. For this purpose, we assume a prior over policy parameters and aim for the ‘posterior’ distribution where the ‘likelihood’ is the expected reward. We propound a Markov chain Monte Carlo algorithm as a method of generating samples for policy parame...

Extracted data

We use cookies to provide a better user experience.

Data Protection

A Markov chain Monte Carlo algorithm for Bayesian policy search

Abstract

Extracted data

A Markov chain Monte Carlo algorithm for Bayesian policy search

Abstract

Extracted data

Related items

Related items