International audienceWe investigate the classical active pure exploration problem in Markov Decision Processes, where the agent sequentially selects actions and, from the resulting system trajectory, aims at identifying the best policy as fast as possible. We propose a problem-dependent lower bound on the average number of steps required before a correct answer can be given with probability at least 1 − δ. We further provide the first algorithm with an instance-specific sample complexity in this setting. This algorithm addresses the general case of communicating MDPs; we also propose a variant with a reduced exploration rate (and hence faster convergence) under an additional ergodicity assumption. This work extends previous results relativ...
<p>This dissertation describes sequential decision making problems in non-stationary environments. O...
We consider an MDP setting in which the reward function is allowed to change during each time step o...
We consider the problem of reliably choosing a near-best strategy from a restricted class of strateg...
International audienceWe investigate the classical active pure exploration problem in Markov Decisio...
International audienceWe investigate the classical active pure exploration problem in Markov Decisio...
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI r...
Presentation given at the MAA Southeast Section. Abstract In a Markov Decision Process, an agent mus...
Sequential decision making is a fundamental task faced by any intelligent agent in an extended inter...
A Markov decision process (MDP) relies on the notions of state, describing the current situation of ...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much the...
AbstractWe study a unichain Markov decision process i.e. a controlled Markov process whose state pro...
Solving Markov decision processes (MDPs) efficiently is challenging in many cases, for example, when...
Markov Decision Problems (MDPs) are the foundation for many problems that are of interest to researc...
<p>This dissertation describes sequential decision making problems in non-stationary environments. O...
We consider an MDP setting in which the reward function is allowed to change during each time step o...
We consider the problem of reliably choosing a near-best strategy from a restricted class of strateg...
International audienceWe investigate the classical active pure exploration problem in Markov Decisio...
International audienceWe investigate the classical active pure exploration problem in Markov Decisio...
Markov decision problems (MDPs) provide the foundations for a number of problems of interest to AI r...
Presentation given at the MAA Southeast Section. Abstract In a Markov Decision Process, an agent mus...
Sequential decision making is a fundamental task faced by any intelligent agent in an extended inter...
A Markov decision process (MDP) relies on the notions of state, describing the current situation of ...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
In this paper, we consider a modified version of the control problem in a model free Markov decision...
Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much the...
AbstractWe study a unichain Markov decision process i.e. a controlled Markov process whose state pro...
Solving Markov decision processes (MDPs) efficiently is challenging in many cases, for example, when...
Markov Decision Problems (MDPs) are the foundation for many problems that are of interest to researc...
<p>This dissertation describes sequential decision making problems in non-stationary environments. O...
We consider an MDP setting in which the reward function is allowed to change during each time step o...
We consider the problem of reliably choosing a near-best strategy from a restricted class of strateg...