Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

Soemers, Dennis
Piette, Eric
Stephenson, Matthew
Browne, Cameron

Publication date

January 2019

DOI

10.1109/cig.2019.8848037

Abstract

In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an e...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

Abstract

Extracted data

Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates

Abstract

Extracted data

Related items

Related items