International audienceWe consider the problem of finding a near-optimal policy using value-function methods in continuous space, discounted Markovian Decision Problems (MDP) when only a single trajectory underlying some policy can be used as the input. Since the state-space is continuous, one must resort to the use of function approximation. In this paper we study a policy iteration algorithm iterating over action-value functions where the iterates are obtained by empirical risk minimization, where the loss function used penalizes high magnitudes of the Bellman-residual. It turns out that when a linear parameterization is used the algorithm is equivalent to least-squares policy iteration. Our main result is a finite-sample, high-probability...
International audienceWe consider the Bellman residual minimization approach for solving discounted ...
This paper presents an approximate policy iteration algorithm for solving infinite-horizon, discount...
Abstract — We consider batch reinforcement learning problems in continuous space, expected total dis...
International audienceWe consider the problem of finding a near-optimal policy using value-function ...
We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian D...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
International audienceWe consider batch reinforcement learning problems in continuous space,expected...
International audienceWe consider batch reinforcement learning problems in continuous space,expected...
ADPRL 2007. Honolulu, Hawaii, Apr 1-5, 2007. We consider batch reinforcement learning problems in c...
International audienceWe consider the Bellman residual minimization approach for solving discounted ...
International audienceWe consider the Bellman residual minimization approach for solving discounted ...
This paper presents an approximate policy iteration algorithm for solving infinite-horizon, discount...
Abstract — We consider batch reinforcement learning problems in continuous space, expected total dis...
International audienceWe consider the problem of finding a near-optimal policy using value-function ...
We consider the problem of finding a near-optimal policy in continuous space, discounted Markovian D...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
We consider batch reinforcement learning problems in continuous space, expected total discounted-rew...
International audienceWe consider batch reinforcement learning problems in continuous space,expected...
International audienceWe consider batch reinforcement learning problems in continuous space,expected...
ADPRL 2007. Honolulu, Hawaii, Apr 1-5, 2007. We consider batch reinforcement learning problems in c...
International audienceWe consider the Bellman residual minimization approach for solving discounted ...
International audienceWe consider the Bellman residual minimization approach for solving discounted ...
This paper presents an approximate policy iteration algorithm for solving infinite-horizon, discount...
Abstract — We consider batch reinforcement learning problems in continuous space, expected total dis...