Heuristic search value iteration for pomdps

Trey Smith
Reid Simmons

Publication date

January 2004

Abstract

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known tech-niques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI’s soundness and con-vergence have been proven. On some bench-mark problems from the literature, HSVI dis-plays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.

Extracted data

We use cookies to provide a better user experience.

Data Protection

Heuristic search value iteration for pomdps

Abstract

Extracted data

Heuristic search value iteration for pomdps

Abstract

Extracted data

Related items

Related items