Bounding the suboptimality of reusing subproblems

Michael Bowling
Manuela Veloso

Publication date

January 1999

Abstract

We are interested in the problem of determining a course of action to achieve a desired objective in a non-deterministic environment. Markov decision processes (MDPs) provide a framework for repre-senting this action selection problem, and there are a number of algorithms that learn optimal policies within this formulation. This framework has also been used to study state space abstraction, problem decomposition, and policy reuse. These techniques sacrifice optimality of their solution for improved learning speed. In this paper we examine the sub-optimality of reusing policies that are solutions to subproblems. This is done within a restricted class of MDPs, namely those where non-zero reward is received only upon reaching a goal state. We ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Bounding the suboptimality of reusing subproblems

Abstract

Extracted data

Bounding the suboptimality of reusing subproblems

Abstract

Extracted data

Related items

Related items