Bandit algorithms for tree search

Pierre-arnaud Coquelin

Publication date

January 2007

Abstract

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go [6]. Their efficient exploration of the tree enables to re-turn rapidly a good value, and improve preci-sion if more time is provided. The UCT algo-rithm [8], a tree search method based on Up-per Confidence Bounds (UCB) [2], is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is “over-optimistic ” in some sense, leading to a worst-case regret that may be very poor. We propose alternative bandit algorithms for tree search. First, a modification of UCT us-ing a confidence sequence that scales expo-nentially in the horizon depth is analyzed. We then consider Flat-UCB performed on th...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Bandit algorithms for tree search

Abstract

Extracted data

Bandit algorithms for tree search

Abstract

Extracted data

Related items

Related items