1Neural Networks, to appear. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

Voot Tangkaratt
Syogo Mori
Tingting Zhao
Jun Morimoto
Masashi Sugiyama

Publication date

December 2014

Abstract

The goal of reinforcement learning (RL) is to let an agent learn an optimal control policy in an unknown environment so that future expected rewards are maximized. The model-free RL approach directly learns the policy based on data samples. Al-though using many samples tends to improve the accuracy of policy learning, collect-ing a large number of samples is often expensive in practice. On the other hand, the model-based RL approach rst estimates the transition model of the environment and then learns the policy based on the estimated transition model. Thus, if the transition model is accurately learned from a small amount of data, the model-based approach is a promising alternative to the model-free approach. In this paper, we propose a no...

Extracted data

We use cookies to provide a better user experience.

Data Protection

1Neural Networks, to appear. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

Abstract

Extracted data

1Neural Networks, to appear. Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

Abstract

Extracted data

Related items

Related items