Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD() algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD() and Q-learning belong. Copyright c fl Massachusetts Institute of Technology, 1993 This report describes research do...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
Stochastic optimal control studies the problem of sequential decision-making under uncertainty. Dyna...
We are interested in understanding stability (almost sure boundedness) of stochastic approximation a...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms ...
Includes bibliographical references (p. 18-20).Supported by the National Science Foundation. ECS-921...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
Abstract. ~-learning (Watkins, 1989) is a simple way for agents o learn how to act optimally incontr...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
Abstract £ We provide some general results on the convergence of a class of stochastic approximation...
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision P...
In this paper, we present a brief survey of reinforcement learning, with particular emphasis on stoc...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
Stochastic optimal control studies the problem of sequential decision-making under uncertainty. Dyna...
We are interested in understanding stability (almost sure boundedness) of stochastic approximation a...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms ...
Includes bibliographical references (p. 18-20).Supported by the National Science Foundation. ECS-921...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
Abstract. ~-learning (Watkins, 1989) is a simple way for agents o learn how to act optimally incontr...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
Abstract £ We provide some general results on the convergence of a class of stochastic approximation...
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision P...
In this paper, we present a brief survey of reinforcement learning, with particular emphasis on stoc...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
Stochastic optimal control studies the problem of sequential decision-making under uncertainty. Dyna...
We are interested in understanding stability (almost sure boundedness) of stochastic approximation a...