Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with uncertainty in the parameters of a Markov Decision Process (MDP). Unlike the case of an MDP, the notion of an optimal policy for a BMDP is not entirely straightforward. We consider two notions of optimality based on optimistic and pessimistic criteria. These have been analyzed for discounted BMDPs. Here we provide results for average reward BMDPs. We establish a fundamental relationship between the discounted and the average reward problems, prove the existence of Blackwell optimal policies and, for both notions of optimality, derive algorithms that converge to the optimal value function
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
AbstractVerification of reachability properties for probabilistic systems is usually based on varian...
We identify a rich class of finite-horizon Markov decision problems (MDPs) for which the variance of...
Abstract. Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with unce...
AbstractIn this paper, we introduce the notion of a bounded-parameter Markov decision process (BMDP)...
AbstractThis paper considers the question of under what circumstances average expected reward optima...
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by...
A Markov decision process (MDP) relies on the notions of state, describing the current situation of ...
AbstractIn the present paper the expected average reward criterion is considered instead of the aver...
We consider multistage decision processes where criterion function is an expectation of minimum func...
International audienceConsidering Markovian Decision Processes (MDPs), the meaning of an optimal pol...
AbstractThis paper deals with the average expected reward criterion for continuous-time Markov decis...
The precise specification of reward functions for Markov decision processes (MDPs) is often extremel...
The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows li...
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average ...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
AbstractVerification of reachability properties for probabilistic systems is usually based on varian...
We identify a rich class of finite-horizon Markov decision problems (MDPs) for which the variance of...
Abstract. Bounded parameter Markov Decision Processes (BMDPs) address the issue of dealing with unce...
AbstractIn this paper, we introduce the notion of a bounded-parameter Markov decision process (BMDP)...
AbstractThis paper considers the question of under what circumstances average expected reward optima...
In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by...
A Markov decision process (MDP) relies on the notions of state, describing the current situation of ...
AbstractIn the present paper the expected average reward criterion is considered instead of the aver...
We consider multistage decision processes where criterion function is an expectation of minimum func...
International audienceConsidering Markovian Decision Processes (MDPs), the meaning of an optimal pol...
AbstractThis paper deals with the average expected reward criterion for continuous-time Markov decis...
The precise specification of reward functions for Markov decision processes (MDPs) is often extremel...
The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows li...
We present an algorithm called Optimistic Linear Programming (OLP) for learning to optimize average ...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
AbstractVerification of reachability properties for probabilistic systems is usually based on varian...
We identify a rich class of finite-horizon Markov decision problems (MDPs) for which the variance of...