How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy outcomes, is an important problem in cognitive science. There are two interrelated questions: how humans represent information, both what has been learned and what can still be learned, and how they choose actions, in particular how they negotiate the tension between exploration and exploitation. In this work, we examine human behavioral data in a multi-armed bandit setting, in which the subject choose one of four “arms” to pull on each trial and receives a binary outcome (win/lose). We implement both the Bayes-optimal policy, which maximizes the expected cumulative reward in this finite-horizon bandit environment, as well as a variety of heu...
This dissertation considers a particular aspect of sequential decision making under uncertainty in w...
How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, ...
AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit pro...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing rep...
We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing rep...
We study learning in a bandit task in which the outcome probabilities of six arms switch (“jump”) ov...
The bandit problem is a dynamic decision-making task that is simply described, well-suited to contro...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
We consider a class of bandit problems in which a decision-maker must choose between a set of altern...
This dissertation considers a particular aspect of sequential decision making under uncertainty in w...
How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, ...
AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit pro...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing rep...
We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing rep...
We study learning in a bandit task in which the outcome probabilities of six arms switch (“jump”) ov...
The bandit problem is a dynamic decision-making task that is simply described, well-suited to contro...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
We consider a class of bandit problems in which a decision-maker must choose between a set of altern...
This dissertation considers a particular aspect of sequential decision making under uncertainty in w...
How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, ...
AbstractWe analyze the robustness of a knowledge gradient (KG) policy for the multi-armed bandit pro...