In recent years, state-of-the-art game-playing agents often involve policies that are trained in self-playing processes where Monte Carlo tree search (MCTS) algorithms and trained policies iteratively improve each other. The strongest results have been obtained when policies are trained to mimic the search behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design, includes an element of exploration, policies trained in this manner are also likely to exhibit a similar extent of exploration. In this paper, we are interested in learning policies for a project with future goals including the extraction of interpretable strategies, rather than state-of-the-art game-playing performance. For these goals, we argue that such an e...
Abstract—Monte-Carlo Tree Search (MCTS) is a recent paradigm for game-tree search, which gradually b...
The aim of general game playing (GGP) is to create programs capable of playing a wide range of diffe...
Local, spatial state-action features can be used to effectively train linear policies from self-play...
In recent years, state-of-the-art game-playing agents often involve policies that are trained in sel...
Recent Reinforcement Learning methods have combined function approximation and Monte Carlo Tree Sear...
This paper proposes using a linear function approximator, rather than a deep neural network (DNN), t...
Expert Iteration (ExIt) is an effective framework for learning game-playing policies from self-play....
This paper proposes CARL, a pair of agents that apply reinforcement learning and function approximat...
Monte-carlo tree search (mcts) is a best-first search method guided by the results of monte-carlo si...
Monte Carlo Tree Search (MCTS) with an appropriate tree policy may be used to approximate a minimax ...
Monte Carlo Tree Search (MCTS) is the state of the art algorithm for General Game Playing (GGP). We ...
Playout policy adaptation (ppa) is a state-of-the-art strategy that has been proposed to control the...
Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulatio...
The success of Monte Carlo tree search (MCTS) in many games, where alpha beta-based search has faile...
Abstract—Monte-Carlo Tree Search (MCTS) is a recent paradigm for game-tree search, which gradually b...
The aim of general game playing (GGP) is to create programs capable of playing a wide range of diffe...
Local, spatial state-action features can be used to effectively train linear policies from self-play...
In recent years, state-of-the-art game-playing agents often involve policies that are trained in sel...
Recent Reinforcement Learning methods have combined function approximation and Monte Carlo Tree Sear...
This paper proposes using a linear function approximator, rather than a deep neural network (DNN), t...
Expert Iteration (ExIt) is an effective framework for learning game-playing policies from self-play....
This paper proposes CARL, a pair of agents that apply reinforcement learning and function approximat...
Monte-carlo tree search (mcts) is a best-first search method guided by the results of monte-carlo si...
Monte Carlo Tree Search (MCTS) with an appropriate tree policy may be used to approximate a minimax ...
Monte Carlo Tree Search (MCTS) is the state of the art algorithm for General Game Playing (GGP). We ...
Playout policy adaptation (ppa) is a state-of-the-art strategy that has been proposed to control the...
Monte-Carlo Tree Search (MCTS) grows a partial game tree and uses a large number of random simulatio...
The success of Monte Carlo tree search (MCTS) in many games, where alpha beta-based search has faile...
Abstract—Monte-Carlo Tree Search (MCTS) is a recent paradigm for game-tree search, which gradually b...
The aim of general game playing (GGP) is to create programs capable of playing a wide range of diffe...
Local, spatial state-action features can be used to effectively train linear policies from self-play...