It is shown that the stability of the stochastic approximation algorithm is implied by the asymptotic stability of the origin for an associated ODE. This in turn implies convergence of the algorithm. Several specific classes of algorithms are considered as applications. It is found that the results provide: (i) a simpler derivation of known results for reinforcement learning algorithms; (ii) a proof for the first time that a class of asynchronous stochastic approximation algorithms are convergent without using any a priori assumption of stability; and (iii) a proof for the first time that asynchronous adaptive critic and Q-learning algorithms are convergent for the average cost optimal control problem
Recent developments in the area of reinforcement learning have yielded a number of new algorithms ...
This paper gives the first rigorous convergence analysis of analogues of Watkins's Q-learning algori...
We present the first sufficient conditions that guarantee stability of two-timescale stochastic appr...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
Includes bibliographical references (p. 18-20).Supported by the National Science Foundation. ECS-921...
Abstract £ We provide some general results on the convergence of a class of stochastic approximation...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
abstract (abridged): many of the present problems in automatic control economic systems and living o...
We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ (k)(h(xk) + wk), where ...
We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ (k)(h(xk) + wk), where ...
This paper considers a class of reinforcement-learning that belongs to the family of Learning Automa...
We present the first sufficient conditions that guarantee stability of two-timescale stochastic appr...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms ...
This paper gives the first rigorous convergence analysis of analogues of Watkins's Q-learning algori...
We present the first sufficient conditions that guarantee stability of two-timescale stochastic appr...
It is shown that the stability of the stochastic approximation algorithm is implied by the asymptoti...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
It is shown here that stability of the stochastic approximation algorithm is implied by the asymptot...
Includes bibliographical references (p. 18-20).Supported by the National Science Foundation. ECS-921...
Abstract £ We provide some general results on the convergence of a class of stochastic approximation...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms fo...
International audienceAlong with the sharp increase in visibility of the field, the rate at which ne...
abstract (abridged): many of the present problems in automatic control economic systems and living o...
We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ (k)(h(xk) + wk), where ...
We discuss synchronous and asynchronous iterations of the form xk+1 = xk + γ (k)(h(xk) + wk), where ...
This paper considers a class of reinforcement-learning that belongs to the family of Learning Automa...
We present the first sufficient conditions that guarantee stability of two-timescale stochastic appr...
Recent developments in the area of reinforcement learning have yielded a number of new algorithms ...
This paper gives the first rigorous convergence analysis of analogues of Watkins's Q-learning algori...
We present the first sufficient conditions that guarantee stability of two-timescale stochastic appr...