Från AlphaZero till alfahjälte : En förstudie om inklusion av additionella trädobservationer i straffinlärning

Carlsson, Fredrik
Öhman, Joey

Publication date

January 2019

Publisher

KTH, Skolan för elektroteknik och datavetenskap (EECS)

Abstract

In self-play reinforcement learning an agent plays games against itself and with the help of hindsight and retrospection improves its policy over time. Using this premise, AlphaZero famously managed to become the strongest known Go, Shogi, and Chess entity by training a deep neural network from data collected solely from self-play. AlphaZero couples this deep neural network with a Monte Carlo Tree Search algorithm that drastically improves the networks initial policy and state evaluation. When training AlphaZero relies on the final outcome of the game for the generation of training labels. By altering the learning target to instead make use of the improved state evaluation acquired after the tree search, the creation of training labels for ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Från AlphaZero till alfahjälte : En förstudie om inklusion av additionella trädobservationer i straffinlärning

Abstract

Extracted data

Från AlphaZero till alfahjälte : En förstudie om inklusion av additionella trädobservationer i straffinlärning

Abstract

Extracted data

Related items

Related items