How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy outcomes, is an im-portant problem in cognitive science. There are two inter-related questions: how humans represent information, both what has been learned and what can still be learned, and how they choose actions, in particular how they negotiate the ten-sion between exploration and exploitation. In this work, we examine human behavioral data in a multi-armed bandit set-ting, in which the subject choose one of four “arms ” to pull on each trial and receives a binary outcome (win/lose). We im-plement both the Bayes-optimal policy, which maximizes the expected cumulative reward in this finite-horizon bandit envi-ronment, as well as a variet...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, ...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing rep...
We study learning in a bandit task in which the outcome probabilities of six arms switch (“jump”) ov...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
The bandit problem is a dynamic decision-making task that is simply described, well-suited to contro...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
We consider a class of bandit problems in which a decision-maker must choose between a set of altern...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
This dissertation considers a particular aspect of sequential decision making under uncertainty in w...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, ...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
How humans achieve long-term goals in an uncertain environment, via repeated trials and noisy observ...
We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing rep...
We study learning in a bandit task in which the outcome probabilities of six arms switch (“jump”) ov...
AbstractWe consider a class of multi-armed bandit problems where the reward obtained by pulling an a...
The bandit problem is a dynamic decision-making task that is simply described, well-suited to contro...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
We consider a class of bandit problems in which a decision-maker must choose between a set of altern...
In this paper we investigate the multi-armed bandit problem, where each arm generates an infinite se...
This dissertation considers a particular aspect of sequential decision making under uncertainty in w...
In a bandit problem there is a set of arms, each of which when played by an agent yields some reward...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, ...