We present a sampling algorithm, called “Recursive Automata Sampling Algorithm ” (RASA), for control of finite horizon Markov decision processes. By extending in a recursive manner the learning automata Pursuit algorithm of Rajaraman and Sastry [5] designed for solving stochastic optimization problems, RASA returns an estimate of both the optimal action from a given state and the corresponding optimal value. Based on the finite-time analysis of the Pursuit algorithm, we provide an analysis for the finite-time behavior of RASA. Specifically, for a given initial state, we derive the following probability bounds as a function of the number of samples: (i) a lower bound on the probability that RASA will sample the optimal action; and (ii) an up...
In many problems of decision making under uncertainty the system has to acquire knowledge of its env...
The concept of automata, central to language theory, is the natural and efficient tool to apprehendv...
This article deals with stochastic processes endowed with the Markov (memoryless) property and evolv...
Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm t...
The problem of a stochastic learning automation interacting with an unknown random environment is co...
This article considers the problem of analyzing the performance of model predictive controllers in m...
This paper deals with the finite-time analysis (FTA) of learning automata (LA), which is a topic for...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
In Chapter 2, we propose several two-timescale simulation-based actor-critic algorithms for solution...
Markov automata constitute an expressive continuous-time compositional modelling formalism, featurin...
We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of the...
This paper presents an overview of the field of Stochastic Learning Automata (LA), and concentrates,...
In many problems of decision making under uncertainty the system has to acquire knowledge of its env...
We consider the problem of "optimal learning" for Markov decision processes with uncertain...
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of...
In many problems of decision making under uncertainty the system has to acquire knowledge of its env...
The concept of automata, central to language theory, is the natural and efficient tool to apprehendv...
This article deals with stochastic processes endowed with the Markov (memoryless) property and evolv...
Based on recent results for multi-armed bandit problems, we propose an adaptive sampling algorithm t...
The problem of a stochastic learning automation interacting with an unknown random environment is co...
This article considers the problem of analyzing the performance of model predictive controllers in m...
This paper deals with the finite-time analysis (FTA) of learning automata (LA), which is a topic for...
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value it...
In Chapter 2, we propose several two-timescale simulation-based actor-critic algorithms for solution...
Markov automata constitute an expressive continuous-time compositional modelling formalism, featurin...
We develop four simulation-based algorithms for finite-horizon Markov decision processes. Two of the...
This paper presents an overview of the field of Stochastic Learning Automata (LA), and concentrates,...
In many problems of decision making under uncertainty the system has to acquire knowledge of its env...
We consider the problem of "optimal learning" for Markov decision processes with uncertain...
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of...
In many problems of decision making under uncertainty the system has to acquire knowledge of its env...
The concept of automata, central to language theory, is the natural and efficient tool to apprehendv...
This article deals with stochastic processes endowed with the Markov (memoryless) property and evolv...