We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in a 2-armed bandit task find that a modification of classical Q-learning algorithms, with outcome-dependent learning rates, better explains behavior compared to constant learning rates. We propose a simple alternative: humans directly track the decision variable underlying choice in the task. Under this policy learning perspective, asymmetric learning can be reinterpreted as an increasing confidence in the preferred choice. We provide specific update rules for incorporating partial feedback (outcomes on chosen arms) and complete feedback (outcome on chosen & unchosen arms), and show that our model consistently outperforms previously propo...
Humans are often faced with an exploration-versus-exploitation trade-off. A commonly used paradigm, ...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
Humans frequently overestimate the likelihood of desirable events while underestimating the likeliho...
We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing rep...
How do people learn? We assess, in a model-free manner, subjectsʼ belief dynamics in a two-armed ban...
The bandit problem is a dynamic decision-making task that is simply described, well-suited to contro...
International audienceMany of the decisions we make in our everyday lives are sequential and entail ...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
Computational models of learning have proved largely successful in characterizing potential mechanis...
Computational models of learning have proved largely successful in characterising potentialmechanism...
Research in cognitive psychology regarding sequential decision-making usually involves tasks where a...
In real-life decision environments people learn from their di-rect experience with alternative cours...
Aim: The nature of attention, and how it interacts with learning and choice processes in the context...
An n- armed bandit task was used to investigate the trade-off between exploratory (...
An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-kn...
Humans are often faced with an exploration-versus-exploitation trade-off. A commonly used paradigm, ...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
Humans frequently overestimate the likelihood of desirable events while underestimating the likeliho...
We study bandit problems in which a decision-maker gets reward-or-failure feedback when choosing rep...
How do people learn? We assess, in a model-free manner, subjectsʼ belief dynamics in a two-armed ban...
The bandit problem is a dynamic decision-making task that is simply described, well-suited to contro...
International audienceMany of the decisions we make in our everyday lives are sequential and entail ...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
Computational models of learning have proved largely successful in characterizing potential mechanis...
Computational models of learning have proved largely successful in characterising potentialmechanism...
Research in cognitive psychology regarding sequential decision-making usually involves tasks where a...
In real-life decision environments people learn from their di-rect experience with alternative cours...
Aim: The nature of attention, and how it interacts with learning and choice processes in the context...
An n- armed bandit task was used to investigate the trade-off between exploratory (...
An n-armed bandit task was used to investigate the trade-off between exploratory (choosing lesser-kn...
Humans are often faced with an exploration-versus-exploitation trade-off. A commonly used paradigm, ...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
Humans frequently overestimate the likelihood of desirable events while underestimating the likeliho...