We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov decision processes, where the reward function is parameterised in terms of a flexible mixture of Gaussians. This approach exploits both analytical tractability and numerical optimization. Consequently, on the one hand, it is more flexible and general than closed-form solutions, such as the widely used linear quadratic Gaussian (LQG) controllers. On the other hand, it is more accurate and faster than optimization methods that rely on approximation and simulation. Partial analytical solutions (though costly) eliminate the need for simulation and, hence, avoid approximation error. The experiments will show that for the same cost of computation,...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision...
We consider Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) objectives...
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov...
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov...
We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process...
We consider finite horizon Markov decision processes under performance measures that involve both th...
We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Obse...
Abstract. Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with unce...
We consider finite horizon Markov decision processes under performance measures that involve both th...
Abstract. We consider a discrete time, ®nite state Markov reward process that depends on a set of pa...
5siContinuous-time Markov decision processes provide a very powerful mathematical framework to solve...
AbstractThis paper deals with the average expected reward criterion for continuous-time Markov decis...
We consider multistage decision processes where criterion function is an expectation of minimum func...
We study the convergence of Markov Decision Processes made of a large number of objects to optimizat...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision...
We consider Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) objectives...
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov...
We derive a new expectation maximization algorithm for policy optimization in linear Gaussian Markov...
We propose a simulation-based algorithm for optimizing the average reward in a Markov Reward Process...
We consider finite horizon Markov decision processes under performance measures that involve both th...
We introduce an on-line algorithm for finding local maxima of the average reward in a Partially Obse...
Abstract. Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with unce...
We consider finite horizon Markov decision processes under performance measures that involve both th...
Abstract. We consider a discrete time, ®nite state Markov reward process that depends on a set of pa...
5siContinuous-time Markov decision processes provide a very powerful mathematical framework to solve...
AbstractThis paper deals with the average expected reward criterion for continuous-time Markov decis...
We consider multistage decision processes where criterion function is an expectation of minimum func...
We study the convergence of Markov Decision Processes made of a large number of objects to optimizat...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision...
We consider Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) objectives...