It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computa...
How does the uncertainty of the value function propagate when performing temporal difference learnin...
Reinforcement learning systems are often concerned with balancing exploration of untested actions ag...
How does the uncertainty of the value function propagate when performing temporal difference learnin...
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient...
This paper studies directed exploration for reinforcement learning agents by tracking uncertainty ab...
Abstract. Reinforcement learning aims to derive an optimal pol-icy for an often initially unknown en...
The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL...
The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL...
The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL...
We consider the problem of reinforcement learning with an orientation toward contexts in which an ag...
In Reinforcement learning the updating of the value functions determines the information spreading a...
Since exact training and inference is not possible for most factor graphs, a number of tech-niques h...
Deep, model based reinforcement learning has shown state of the art, human-exceeding performance in ...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
Reinforcement learning models generally assume that a stimulus is presented that allows a learner to...
How does the uncertainty of the value function propagate when performing temporal difference learnin...
Reinforcement learning systems are often concerned with balancing exploration of untested actions ag...
How does the uncertainty of the value function propagate when performing temporal difference learnin...
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient...
This paper studies directed exploration for reinforcement learning agents by tracking uncertainty ab...
Abstract. Reinforcement learning aims to derive an optimal pol-icy for an often initially unknown en...
The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL...
The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL...
The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL...
We consider the problem of reinforcement learning with an orientation toward contexts in which an ag...
In Reinforcement learning the updating of the value functions determines the information spreading a...
Since exact training and inference is not possible for most factor graphs, a number of tech-niques h...
Deep, model based reinforcement learning has shown state of the art, human-exceeding performance in ...
Abstract The problem of reinforcement learning in a non-Markov environment isexplored using a dynami...
Reinforcement learning models generally assume that a stimulus is presented that allows a learner to...
How does the uncertainty of the value function propagate when performing temporal difference learnin...
Reinforcement learning systems are often concerned with balancing exploration of untested actions ag...
How does the uncertainty of the value function propagate when performing temporal difference learnin...