We derive sublinear regret bounds for undiscounted reinforcement learning in con-tinuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncer-tainty. Beside the existence of an optimal policy which satisfies the Poisson equa-tion, the only assumptions made are Hölder continuity of rewards and transition probabilities.
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
International audienceWe consider the problem of online reinforcement learning when several state re...
We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds...
International audienceWe consider the problem of undiscounted reinforcement learning in continuous s...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
International audienceWe consider a reinforcement learning setting where the learner also has to dea...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-h...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
International audienceWe consider the problem of online reinforcement learning when several state re...
We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds...
International audienceWe consider the problem of undiscounted reinforcement learning in continuous s...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
International audienceWe consider a reinforcement learning setting where the learner also has to dea...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
We study reinforcement learning for continuous-time Markov decision processes (MDPs) in the finite-h...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
International audienceWe consider the problem of online reinforcement learning when several state re...