We consider the problem of control of hierarchical Markov decision processes and develop a simulation based two-timescale actor-critic algorithm in a general framework. We also develop certain approximation algorithms that require less computation and satisfy a performance bound. One of the approximation algorithms is a three-timescale actor-critic algorithm while the other is a two-timescale algorithm, however, which operates in two separate stages. All our algorithms recursively update randomized policies using the simultaneous perturbation stochastic approximation (SPSA) methodology. We briefly present the convergence analysis of our algorithms. We then present numerical experiments on a problem of production planning in semiconductor fa...
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov deci...
We develop a simulation based algorithm for finite horizon Markov decision processes with finite sta...
We develop a simulation based algorithm for finite horizon Markov decision processes with finite sta...
A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov deci...
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision ...
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision ...
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision ...
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision ...
In Chapter 2, we propose several two-timescale simulation-based actor-critic algorithms for solution...
Due to their non-stationarity, finite-horizon Markov decision processes (FH-MDPs) have one probabili...
We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Mar...
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of...
Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated tra...
Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated tra...
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov deci...
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov deci...
We develop a simulation based algorithm for finite horizon Markov decision processes with finite sta...
We develop a simulation based algorithm for finite horizon Markov decision processes with finite sta...
A two-timescale simulation-based actor-critic algorithm for solution of infinite horizon Markov deci...
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision ...
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision ...
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision ...
The actor-critic algorithm of Barto and others for simulation-based optimization of Markov decision ...
In Chapter 2, we propose several two-timescale simulation-based actor-critic algorithms for solution...
Due to their non-stationarity, finite-horizon Markov decision processes (FH-MDPs) have one probabili...
We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Mar...
This article proposes several two-timescale simulation-based actor-critic algorithms for solution of...
Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated tra...
Algorithms for learning the optimal policy of a Markov decision process (MDP) based on simulated tra...
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov deci...
We develop a simulation-based, two-timescale actor-critic algorithm for infinite horizon Markov deci...
We develop a simulation based algorithm for finite horizon Markov decision processes with finite sta...
We develop a simulation based algorithm for finite horizon Markov decision processes with finite sta...