Abstract £ We provide some general results on the convergence of a class of stochastic approximation algorithms and their parallel and asynchronous variants. We then use these results to study the Q-learning algorithm, arein-forcement learning method for solving Markov decision problems, and establish its convergence under conditions more general than previously available
A class of nonlinear learning algorithms for the Q-and S-model stochastic automaton-random environme...
We propose for risk-sensitive control of finite Markov chains a counterpart of the popular Q-learnin...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
Includes bibliographical references (p. 18-20).Supported by the National Science Foundation. ECS-921...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodo...
We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ (k)(h(xk) + wk), where ...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision P...
In this paper, we present a brief survey of reinforcement learning, with particular emphasis on stoc...
We consider the stochastic shortest path problem, a classical finite-state Markovian decision proble...
This paper gives the first rigorous convergence analysis of analogues of Watkins's Q-learning algori...
This paper investigates reinforcement learning problems where a stochastic time delay is present in ...
A class of nonlinear learning algorithms for the Q-and S-model stochastic automaton-random environme...
We propose for risk-sensitive control of finite Markov chains a counterpart of the popular Q-learnin...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
Includes bibliographical references (p. 18-20).Supported by the National Science Foundation. ECS-921...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodo...
We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ (k)(h(xk) + wk), where ...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision P...
In this paper, we present a brief survey of reinforcement learning, with particular emphasis on stoc...
We consider the stochastic shortest path problem, a classical finite-state Markovian decision proble...
This paper gives the first rigorous convergence analysis of analogues of Watkins's Q-learning algori...
This paper investigates reinforcement learning problems where a stochastic time delay is present in ...
A class of nonlinear learning algorithms for the Q-and S-model stochastic automaton-random environme...
We propose for risk-sensitive control of finite Markov chains a counterpart of the popular Q-learnin...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...