Ecient planning plays a crucial role in model-based reinforcement learning. Tradi-tionally, the main planning operation is a full backup based on the current estimates of the successor states. Consequently, its com-putation time is proportional to the num-ber of successor states. In this paper, we introduce a new planning backup that uses only the current value of a single successor state and has a computation time indepen-dent of the number of successor states. This new backup, which we call a small backup, opens the door to a new class of model-based reinforcement learning methods that exhibit much finer control over their planning process than traditional methods. We empirically demonstrate that this increased flexibility al-lows for mor...
The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stre...
Search based planners such as A* and Dijkstra\u27s algorithm are proven methods for guiding today\u2...
We address the problem of computing an optimal value func-tion for Markov decision processes. Since ...
Recent advancements in model-based reinforcement learn-ing have shown that the dynamics of many stru...
Recent advancements in model-based reinforcement learning have shown that the dynamics of many struc...
PAC-MDP algorithms are particularly efficient in terms of the num-ber of samples obtained from the e...
Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a k...
Abstract. Reinforcement learning (RL) involves sequential decision making in uncertain environments....
Prioritisation of Bellman backups or updating only a small subset of actions represent important tec...
This paper investigates a new approach to model-based reinforcement learning using background planni...
Partial order planning is an important approach that solves planning problems without completely spe...
Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step...
Models of dynamical systems based on predictive state rep-resentations (PSRs) use predictions of fut...
We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, ...
We introduce an algorithm for model-based hierarchical reinforcement learning to acquire self-contai...
The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stre...
Search based planners such as A* and Dijkstra\u27s algorithm are proven methods for guiding today\u2...
We address the problem of computing an optimal value func-tion for Markov decision processes. Since ...
Recent advancements in model-based reinforcement learn-ing have shown that the dynamics of many stru...
Recent advancements in model-based reinforcement learning have shown that the dynamics of many struc...
PAC-MDP algorithms are particularly efficient in terms of the num-ber of samples obtained from the e...
Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a k...
Abstract. Reinforcement learning (RL) involves sequential decision making in uncertain environments....
Prioritisation of Bellman backups or updating only a small subset of actions represent important tec...
This paper investigates a new approach to model-based reinforcement learning using background planni...
Partial order planning is an important approach that solves planning problems without completely spe...
Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step...
Models of dynamical systems based on predictive state rep-resentations (PSRs) use predictions of fut...
We introduce Dynamic Planning Networks (DPN), a novel architecture for deep reinforcement learning, ...
We introduce an algorithm for model-based hierarchical reinforcement learning to acquire self-contai...
The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stre...
Search based planners such as A* and Dijkstra\u27s algorithm are proven methods for guiding today\u2...
We address the problem of computing an optimal value func-tion for Markov decision processes. Since ...