Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Pepels, Tom
Cazenave, Tristan
Winands, Mark H. M.
Lanctot, Marc

Open link

Publication date

January 2014

DOI

10.1007/978-3-319-14923-3_1

Publisher

Springer

Abstract

Regret minimization is important in both the multi-armed bandit problem and monte-carlo tree search (mcts). Recently, simple regret, i.e., the regret of not recommending the best action, has been proposed as an alternative to cumulative regret in mcts, i.e., regret accumulated over time. Each type of regret is appropriate in different contexts. Although the majority of mcts research applies the uct selection policy for minimizing cumulative regret in the tree, this paper introduces a new mcts variant, hybrid mcts (h-mcts), which minimizes both types of regret in different parts of the tree. H-mcts uses shot, a recursive version of sequential halving, to minimize simple regret near the root, and uct to minimize cumulative regret when descend...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Abstract

Extracted data

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

Abstract

Extracted data

Related items

Related items