We propose for risk-sensitive control of finite Markov chains a counterpart of the popular Q-learning algorithm for classical Markov decision processes. The algorithm is shown to converge with probability one to the desired solution. The proof technique is an adaptation of the o.d.e. approach for the analysis of stochastic approximation algorithms, with most of the work involved used for the analysis of the specific o.d.e.s that arise
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decisio...
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decisio...
We consider the problem of "optimal learning" for Markov decision processes with uncertain...
Abstract — Q-learning is a technique used to compute an opti-mal policy for a controlled Markov chai...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
A linear function approximation-based reinforcement learning algorithm is proposed for Markov decisi...
A linear function approximation-based reinforcement learning algorithm is proposed for Markov decisi...
A linear function approximation-based reinforcement learning algorithm is proposed for Markov decisi...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
We propose for risk sensitive control of finite Markov chains a counterpart of the popular 'actor-cr...
We propose for risk sensitive control of finite Markov chains a counterpart of the popular 'actor-cr...
Includes bibliographical references (p. 18-20).Supported by the National Science Foundation. ECS-921...
A simulation-based algorithm for learning good policies for a discrete-time stochastic control proce...
Abstract — In recent work it was shown that a deterministic analog of stochastic approximation can b...
Abstract £ We provide some general results on the convergence of a class of stochastic approximation...
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decisio...
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decisio...
We consider the problem of "optimal learning" for Markov decision processes with uncertain...
Abstract — Q-learning is a technique used to compute an opti-mal policy for a controlled Markov chai...
Q-learning (Watkins, 1989) is a simple way for agents to learn how to act optimally in controlled Ma...
A linear function approximation-based reinforcement learning algorithm is proposed for Markov decisi...
A linear function approximation-based reinforcement learning algorithm is proposed for Markov decisi...
A linear function approximation-based reinforcement learning algorithm is proposed for Markov decisi...
We address the problem of computing the optimal Q-function in Markov decision prob-lems with infinit...
We propose for risk sensitive control of finite Markov chains a counterpart of the popular 'actor-cr...
We propose for risk sensitive control of finite Markov chains a counterpart of the popular 'actor-cr...
Includes bibliographical references (p. 18-20).Supported by the National Science Foundation. ECS-921...
A simulation-based algorithm for learning good policies for a discrete-time stochastic control proce...
Abstract — In recent work it was shown that a deterministic analog of stochastic approximation can b...
Abstract £ We provide some general results on the convergence of a class of stochastic approximation...
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decisio...
We present a novel multi-timescale Q-learning algorithm for average cost control in a Markov decisio...
We consider the problem of "optimal learning" for Markov decision processes with uncertain...