We present a class of metrics, defined on the state space of a finite Markov decision process (MDP), each of which is sound with respect to stochastic bisimulation, a notion of MDP state equivalence derived from the theory of concurrent processes. Such metrics are based on similar metrics developed in the context of labelled Markov processes, and like those, are suitable for state space aggregation. Furthermore, we restrict our attention to a subset of this class that is appropriate for certain reinforcement learning (RL) tasks, specifically, infinite horizon tasks with an expected total discounted reward optimality criterion. Given such an RL metric, we provide bounds relating it to the optimal value function of the original MDP a...
We present metric- E3 a provably near-optimal algorithm for reinforcement learning in Markov decisio...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We present a framework to address a class of sequential decision making problems. Our framework feat...
Address email We present an approximation scheme for solving Markov Decision Processes (MDPs) in whi...
International audienceBisimulation is a notion of behavioural equivalence on the statesof a transiti...
International audienceBisimulation is a notion of behavioural equivalence on the statesof a transiti...
Bisimulation is a notion of behavioural equiva-lence on the states of a transition system. Its defi-...
We present a provably near-optimal algorithm for reinforcement learn-ing in Markov decision processe...
In most practical applications of reinforcement learning, it is untenable to maintain direct estimat...
We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Pr...
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
A Markov decision process (MDP) relies on the notions of state, describing the current situation of ...
We define a metric for measuring behavior similarity between states in a Markov decision process (MD...
We present metric- E3 a provably near-optimal algorithm for reinforcement learning in Markov decisio...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We present a framework to address a class of sequential decision making problems. Our framework feat...
Address email We present an approximation scheme for solving Markov Decision Processes (MDPs) in whi...
International audienceBisimulation is a notion of behavioural equivalence on the statesof a transiti...
International audienceBisimulation is a notion of behavioural equivalence on the statesof a transiti...
Bisimulation is a notion of behavioural equiva-lence on the states of a transition system. Its defi-...
We present a provably near-optimal algorithm for reinforcement learn-ing in Markov decision processe...
In most practical applications of reinforcement learning, it is untenable to maintain direct estimat...
We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Pr...
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
In this paper we address the following basic feasibility problem for infinite-horizon Markov decisio...
A Markov decision process (MDP) relies on the notions of state, describing the current situation of ...
We define a metric for measuring behavior similarity between states in a Markov decision process (MD...
We present metric- E3 a provably near-optimal algorithm for reinforcement learning in Markov decisio...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
We present a framework to address a class of sequential decision making problems. Our framework feat...