Polynomial-time reinforcement learning of near-optimal policies

Yoav Shoham

Publication date

December 2014

Abstract

Inspired by recent results on polynomial time reinforcement algorithms that accumulate near-optimal rewards, we look at the related problem of quickly learning near-optimal policies. The new problem is obviously related to the previous one, but different in important ways. We provide simple algorithms for MDPs, zero-sum and common-payoff Stochastic Games, and a uniform framework for proving their polynomial com-plexity. Unlike the previously studied problem, these bounds use the minimum between the mixing time and a new quantity- the spectral radius. Unlike the previous results, our results apply uniformly to the average and discounted cases.

Extracted data

We use cookies to provide a better user experience.

Data Protection

Polynomial-time reinforcement learning of near-optimal policies

Abstract

Extracted data

Polynomial-time reinforcement learning of near-optimal policies

Abstract

Extracted data

Related items

Related items