What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).DistRL permits, however, to evaluate other functionals approximately. We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.These resul...
We present a class of metrics, defined on the state space of a finite Markov decision process (MDP)...
Problems involving optimal sequential making in uncertain dynamic systems arise in domains such as e...
Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for...
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the pre...
We consider finite horizon Markov decision processes under performance measures that involve both th...
AbstractFor countable-state decision processes (dynamic programming problems), a general class of ob...
Abstract. Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with unce...
Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
We consider Markov decision processes (MDPs) with multiple limit-average (ormean-payoff) objectives....
We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We...
AbstractIn the present paper the expected average reward criterion is considered instead of the aver...
We consider finite horizon Markov decision processes under performance measures that involve both th...
Reinforcement learning is a general computational framework for learning sequential decision strate...
A Markov decision process (MDP) relies on the notions of state, describing the current situation of ...
We present a class of metrics, defined on the state space of a finite Markov decision process (MDP)...
Problems involving optimal sequential making in uncertain dynamic systems arise in domains such as e...
Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for...
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the pre...
We consider finite horizon Markov decision processes under performance measures that involve both th...
AbstractFor countable-state decision processes (dynamic programming problems), a general class of ob...
Abstract. Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with unce...
Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in...
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer...
We consider Markov decision processes (MDPs) with multiple limit-average (ormean-payoff) objectives....
We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We...
AbstractIn the present paper the expected average reward criterion is considered instead of the aver...
We consider finite horizon Markov decision processes under performance measures that involve both th...
Reinforcement learning is a general computational framework for learning sequential decision strate...
A Markov decision process (MDP) relies on the notions of state, describing the current situation of ...
We present a class of metrics, defined on the state space of a finite Markov decision process (MDP)...
Problems involving optimal sequential making in uncertain dynamic systems arise in domains such as e...
Markov decision processes (MDPs) and their variants are widely studied in the theory of controls for...