International audienceWe consider a reinforcement learning setting where the learner also has to deal with the problem of finding a suitable state-representation function from a given set of models. This has to be done while interacting with the environment in an online fashion (no resets), and the goal is to have small regret with respect to any Markov model in the set. For this setting, recently the \BLB~algorithm has been proposed, which achieves regret of order $T^{2/3}$, provided that the given set of models is finite. Our first contribution is to extend this result to a countably infinite set of models. Moreover, the \BLB~regret bound suffers from an additive term that can be exponential in the diameter of the MDP involved, since the ...
We derive sublinear regret bounds for undiscounted reinforcement learning in con-tinuous state space...
We study a general class of learning algorithms, which we call regret-matching algorithms, along wit...
We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-...
International audienceWe consider a reinforcement learning setting where the learner also has to dea...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
International audienceWe consider the problem of online reinforcement learning when several state re...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
International audienceWe study the role of the representation of state-action value functions in reg...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
The problem of selecting the right state-representation in a reinforcement learning problem is consi...
We derive sublinear regret bounds for undiscounted reinforcement learning in con-tinuous state space...
We study a general class of learning algorithms, which we call regret-matching algorithms, along wit...
We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-...
International audienceWe consider a reinforcement learning setting where the learner also has to dea...
We consider an agent interacting with an environment in a single stream of actions, observations, an...
We consider an agent interacting with an en-vironment in a single stream of actions, ob-servations, ...
International audienceWe consider the problem of online reinforcement learning when several state re...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
International audienceWe study the role of the representation of state-action value functions in reg...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
We consider a class of sequential decision making problems in the presence of uncertainty, which bel...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
International audienceThe problem of reinforcement learning in an unknown and discrete Markov Decisi...
The problem of selecting the right state-representation in a reinforcement learning problem is consi...
We derive sublinear regret bounds for undiscounted reinforcement learning in con-tinuous state space...
We study a general class of learning algorithms, which we call regret-matching algorithms, along wit...
We consider the finite horizon continuous reinforcement learning problem. Our contribution is three-...