Cheap but Clever: Human Active Learning in a Bandit Setting

Zhang, Shunan
Yu, Angela J

Publication date

January 2013

Publisher

Public Library of Science (PLoS)

Abstract

How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy outcomes, is an important problem in cognitive science. There are two interrelated questions: how humans represent information, both what has been learned and what can still be learned, and how they choose actions, in particular how they negotiate the tension between exploration and exploitation. In this work, we examine human behavioral data in a multi-armed bandit setting, in which the subject choose one of four “arms” to pull on each trial and receives a binary outcome (win/lose). We implement both the Bayes-optimal policy, which maximizes the expected cumulative reward in this ﬁnite-horizon bandit environment, as well as a variety of heu...