We present a horizon-based value iteration algorithm called Reverse Value Iteration (RVI). Empirical results on a variety of domains, both synthetic and real, show RVI often yields speedups of several orders of magnitude. RVI does this by ordering backups by horizons, with preference given to closer horizons, thereby avoiding many unnecessary and incorrect backups. We also compare to related work, including prioritized and partitioned value iteration approaches, and show that our technique performs favorably. The techniques presented in RVI are complementary and can be used in conjunction with previous techniques. We prove that RVI converges and often has better (but never worse) complexity than standard value iteration. To the a...
AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Marko...
Value function iteration is one of the Standard tools for the solution of dynamic general equilibriu...
International audienceWe present a classification-based policy iteration algorithm, called Direct Po...
We present a horizon-based value iteration algorithm called Re-verse Value Iteration (RVI). Empirica...
Prioritisation of Bellman backups or updating only a small subset of actions represent important tec...
Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes...
Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes...
Value iteration (VI) is a foundational dynamic programming method, important for learning and planni...
Temporally extended actions have proven useful for reinforcement learning, but their duration also m...
This paper is concerned with the links between the Value Iteration algorithm and the Rolling Horizon...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
Partially observable Markov decision processes (POMDPs) have recently become pop-ular among many AI ...
Several researchers have shown that the efficiency of value iteration, a dynamic programming algorit...
In this paper, we examine the intuition that TD() is meant to operate by approximating asynchronous ...
Abstract. We survey value iteration algorithms on graphs. Such algo-rithms can be used for determini...
AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Marko...
Value function iteration is one of the Standard tools for the solution of dynamic general equilibriu...
International audienceWe present a classification-based policy iteration algorithm, called Direct Po...
We present a horizon-based value iteration algorithm called Re-verse Value Iteration (RVI). Empirica...
Prioritisation of Bellman backups or updating only a small subset of actions represent important tec...
Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes...
Value iteration is a fundamental algorithm for solving Markov Decision Processes (MDPs). It computes...
Value iteration (VI) is a foundational dynamic programming method, important for learning and planni...
Temporally extended actions have proven useful for reinforcement learning, but their duration also m...
This paper is concerned with the links between the Value Iteration algorithm and the Rolling Horizon...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
Partially observable Markov decision processes (POMDPs) have recently become pop-ular among many AI ...
Several researchers have shown that the efficiency of value iteration, a dynamic programming algorit...
In this paper, we examine the intuition that TD() is meant to operate by approximating asynchronous ...
Abstract. We survey value iteration algorithms on graphs. Such algo-rithms can be used for determini...
AbstractQ-Learning is based on value iteration and remains the most popular choice for solving Marko...
Value function iteration is one of the Standard tools for the solution of dynamic general equilibriu...
International audienceWe present a classification-based policy iteration algorithm, called Direct Po...