Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning. Here, we lay out a family of approaches by which model-based computation may be built upon a core of TD learning. The foundation of this framework is the successor representation, a predictive state representation that, when c...
<p>To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult w...
Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in re...
Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We ...
Humans and animals are capable of evaluating actions by considering their long-run future rewards th...
The ability to predict long-term future rewards is crucial for survival. Some animals may have to ...
Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selection pol...
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only bee...
Theories of reward learning in neuroscience have focused on two families of algorithms, thought to c...
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only bee...
SummaryReinforcement learning (RL) uses sequential experience with situations (“states”) and outcome...
Despite many debates in the first half of the twentieth century, it is now largely a truism that hum...
Reinforcement learning (RL) uses sequential experience with situations (“states”) and outcomes to as...
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in t...
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only bee...
National audienceAccounting for behavioural capabilities and flexibilities experimentally observed i...
<p>To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult w...
Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in re...
Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We ...
Humans and animals are capable of evaluating actions by considering their long-run future rewards th...
The ability to predict long-term future rewards is crucial for survival. Some animals may have to ...
Reinforcement Learning (RL) algorithms allow artificial agents to improve their action selection pol...
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only bee...
Theories of reward learning in neuroscience have focused on two families of algorithms, thought to c...
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only bee...
SummaryReinforcement learning (RL) uses sequential experience with situations (“states”) and outcome...
Despite many debates in the first half of the twentieth century, it is now largely a truism that hum...
Reinforcement learning (RL) uses sequential experience with situations (“states”) and outcomes to as...
We develop a novel, biologically detailed neural model of reinforcement learning (RL) processes in t...
Animals repeat rewarded behaviors, but the physiological basis of reward-based learning has only bee...
National audienceAccounting for behavioural capabilities and flexibilities experimentally observed i...
<p>To behave adaptively, we must learn from the consequences of our actions. Doing so is difficult w...
Temporal-difference (TD) learning can be used not just to predict rewards, as is commonly done in re...
Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We ...