(A) Mean performance of Q-learning on non-terminal pairs throughout training and testing, sorted by symbolic distance. (B). Mean performance of Q-learning for each of the 21 pairs at transfer, sorted by symbolic distance. Critical pairs are shaded in gray. (C) Mean performance of Q-learning for each of the 21 pairs at the end of training. (D) to (F) As above, but reporting simulated performance of the Value Transfer Model (VTM). (G) to (I) As above, but reporting the simulated performance of RL-REMERGE
<p>Three phases were included for each algorithm: 200 trials of adjacent pairs only, followed by 200...
<p>Estimates compare performance by subjects (blue lines) to those generated by simulations using ea...
We study the sample complexity of teaching, termed as ``teaching dimension" (TDim) in the literature...
(A) Mean performance of Q-learning on non-terminal pairs throughout training and testing, sorted by ...
(A) Mean performance of Q-learning on non-terminal pairs throughout training and testing, sorted by ...
(A) Mean performance of Q-learning on non-terminal pairs throughout training and testing, sorted by ...
During training, two 5-item lists were trained (adjacent pairs only). In the Linking condition, acto...
(A) Mean performance of RL-Elo on non-terminal pairs throughout training and testing, sorted by symb...
During training, two 5-item lists were trained (adjacent pairs only). In the Linking condition, acto...
Previous studies have shown that training a reinforcement model for the sorting problem takes very l...
this paper we examine the behaviour of one such model-free algorithm, Q() [2]. This algorithm shows ...
<p>Performance on non-terminal stimulus pairs (i.e. those excluding stimuli <em>A</em> and <em>G</em...
<p>Simulated response accuracy for all stimulus pairs of a seven-item list using betasort (red), bet...
This paper describes several new online model-free reinforcement learning (RL) algorithms. We design...
The trial-and-error learning task was performed by N = 85 subjects. For each subject, it was tested ...
<p>Three phases were included for each algorithm: 200 trials of adjacent pairs only, followed by 200...
<p>Estimates compare performance by subjects (blue lines) to those generated by simulations using ea...
We study the sample complexity of teaching, termed as ``teaching dimension" (TDim) in the literature...
(A) Mean performance of Q-learning on non-terminal pairs throughout training and testing, sorted by ...
(A) Mean performance of Q-learning on non-terminal pairs throughout training and testing, sorted by ...
(A) Mean performance of Q-learning on non-terminal pairs throughout training and testing, sorted by ...
During training, two 5-item lists were trained (adjacent pairs only). In the Linking condition, acto...
(A) Mean performance of RL-Elo on non-terminal pairs throughout training and testing, sorted by symb...
During training, two 5-item lists were trained (adjacent pairs only). In the Linking condition, acto...
Previous studies have shown that training a reinforcement model for the sorting problem takes very l...
this paper we examine the behaviour of one such model-free algorithm, Q() [2]. This algorithm shows ...
<p>Performance on non-terminal stimulus pairs (i.e. those excluding stimuli <em>A</em> and <em>G</em...
<p>Simulated response accuracy for all stimulus pairs of a seven-item list using betasort (red), bet...
This paper describes several new online model-free reinforcement learning (RL) algorithms. We design...
The trial-and-error learning task was performed by N = 85 subjects. For each subject, it was tested ...
<p>Three phases were included for each algorithm: 200 trials of adjacent pairs only, followed by 200...
<p>Estimates compare performance by subjects (blue lines) to those generated by simulations using ea...
We study the sample complexity of teaching, termed as ``teaching dimension" (TDim) in the literature...