Fitted Q-iteration in continuous action-space MDPs

Antos, Andras
Munos, Rémi
Szepesvari, Csaba

Publication date

January 2007

Publisher

HAL CCSD

Abstract

We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems

Extracted data

We use cookies to provide a better user experience.

Data Protection

Fitted Q-iteration in continuous action-space MDPs

Abstract

Extracted data

Fitted Q-iteration in continuous action-space MDPs

Abstract

Extracted data

Related items

Related items