In this paper, we examine the intuition that TD() is meant to operate by approximating asynchronous value iteration. We note that on the important class of discrete acyclic stochastic tasks, value iteration is inefficient compared with the DAG-SP algorithm, which essentially performs only one sweep instead of many by working backwards from the goal. The question we address in this paper is whether there is an analogous algorithm that can be used in large stochastic state spaces requiring function approximation. We present such an algorithm, analyze it, and give comparative results to TD on several domains. LEARNING CONTROL BACKWARDS Computing an accurate value function is the key to dynamic-programming-based algorithms for optimal sequenti...
We are interested in understanding stability (almost sure boundedness) of stochastic approximation a...
In this paper, we consider a class of continuous-time, continuous-space stochastic optimal control p...
International audienceIn this work we consider the time discretization of stochastic optimal control...
This paper contributes with a unified formulation that merges previ- ous analysis on the prediction ...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...
A classic solution technique for Markov decision processes (MDP) and stochastic games (SG) is value ...
This brief studies the stochastic optimal control problem via reinforcement learning and approximate...
Caption title.Includes bibliographical references (leaf [7]).Supported by NSF. ECS 9216531 Supported...
We consider terminating Markov decision processes with imperfect state information. We first assume ...
Title: Stochastic Dynamic Programming Problems: Theory and Applications Author: Gabriel Lendel Depar...
The discrete-time stochastic optimal control problem is approximated by a variation of differential ...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
This paper is concerned with the links between the Value Iteration algorithm and the Rolling Horizon...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
Discrete-time stochastic systems are an essential modelling tool for many engineering systems. We co...
We are interested in understanding stability (almost sure boundedness) of stochastic approximation a...
In this paper, we consider a class of continuous-time, continuous-space stochastic optimal control p...
International audienceIn this work we consider the time discretization of stochastic optimal control...
This paper contributes with a unified formulation that merges previ- ous analysis on the prediction ...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...
A classic solution technique for Markov decision processes (MDP) and stochastic games (SG) is value ...
This brief studies the stochastic optimal control problem via reinforcement learning and approximate...
Caption title.Includes bibliographical references (leaf [7]).Supported by NSF. ECS 9216531 Supported...
We consider terminating Markov decision processes with imperfect state information. We first assume ...
Title: Stochastic Dynamic Programming Problems: Theory and Applications Author: Gabriel Lendel Depar...
The discrete-time stochastic optimal control problem is approximated by a variation of differential ...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
This paper is concerned with the links between the Value Iteration algorithm and the Rolling Horizon...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
Discrete-time stochastic systems are an essential modelling tool for many engineering systems. We co...
We are interested in understanding stability (almost sure boundedness) of stochastic approximation a...
In this paper, we consider a class of continuous-time, continuous-space stochastic optimal control p...
International audienceIn this work we consider the time discretization of stochastic optimal control...