We consider the problem of online planning in a Markov decision process with discounted rewards for any given initial state. We consider the PAC sample com-plexity problem of computing, with probability 1−δ, an �-optimal action using the smallest possible number of calls to the generative model (which provides reward and next-state samples). We design an algorithm, called StOP (for Stochastic-Optimistic Planning), based on the “optimism in the face of uncertainty ” princi-ple. StOP can be used in the general setting, requires only a generative model, and enjoys a complexity bound that only depends on the local structure of the MDP.
International audienceWe consider the problem of online planning in a Markov Decision Process when g...
Markov decision processes (MDP) offer a rich model that has been extensively used by the AI communit...
We provide a method, based on the theory of Markov decision processes, for efficient planning in sto...
International audienceWe consider the problem of online planning in a Markov decision process with d...
We review a class of online planning algorithms for deterministic and stochastic optimal control pro...
Abstract—We propose an online planning algorithm for finite-action, sparsely stochastic Markov decis...
Chapter 22International audienceWe review a class of online planning algorithms for deterministic an...
This paper addresses the problem of online planning in Markov decision processes using a randomized ...
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focus...
International audienceThis paper addresses the problem of online planning in Markov decision process...
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focus...
The reinforcement learning community has recently intensified its interest in online plan-ning metho...
International audienceWe consider the problem of planning in a stochastic and discounted environment...
Markov chains are the de facto finite-state model for stochastic dynamical systems, and Markov decis...
International audienceThe reinforcement learning community has recently intensified its interest in ...
International audienceWe consider the problem of online planning in a Markov Decision Process when g...
Markov decision processes (MDP) offer a rich model that has been extensively used by the AI communit...
We provide a method, based on the theory of Markov decision processes, for efficient planning in sto...
International audienceWe consider the problem of online planning in a Markov decision process with d...
We review a class of online planning algorithms for deterministic and stochastic optimal control pro...
Abstract—We propose an online planning algorithm for finite-action, sparsely stochastic Markov decis...
Chapter 22International audienceWe review a class of online planning algorithms for deterministic an...
This paper addresses the problem of online planning in Markov decision processes using a randomized ...
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focus...
International audienceThis paper addresses the problem of online planning in Markov decision process...
We consider online planning in Markov decision processes (MDPs). In online planning, the agent focus...
The reinforcement learning community has recently intensified its interest in online plan-ning metho...
International audienceWe consider the problem of planning in a stochastic and discounted environment...
Markov chains are the de facto finite-state model for stochastic dynamical systems, and Markov decis...
International audienceThe reinforcement learning community has recently intensified its interest in ...
International audienceWe consider the problem of online planning in a Markov Decision Process when g...
Markov decision processes (MDP) offer a rich model that has been extensively used by the AI communit...
We provide a method, based on the theory of Markov decision processes, for efficient planning in sto...