We consider a Reinforcement Learning setup without any (esp. MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This impl...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceLeveraging an equivalence property in the state-space of a Markov Decision Pro...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observatio...
International audienceWe consider an agent interacting with an environment in a single stream of act...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
The reinforcement learning (RL) framework formalizes the notion of learning with interactions. Many ...
Leveraging an equivalence property on the set of states of state-action pairs in anMarkov Decision P...
In this paper, we give a brief review of Markov Decision Processes (MDPs), and how Reinforcement Lea...
We describe how to use robust Markov decision processes for value function ap-proximation with state...
International audienceWe consider the problem of online reinforcement learning when several state re...
In Reinforcement Learning (RL), regret guarantees scaling with the square root of the time horizon h...
In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and...
We address the problem of reinforcement learning in which observations may exhibit an arbitrary form...
The application of reinforcement learning (RL) algorithms is often hindered by the combinatorial exp...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceLeveraging an equivalence property in the state-space of a Markov Decision Pro...
General purpose intelligent learning agents cycle through (complex,non-MDP) sequences of observatio...
International audienceWe consider an agent interacting with an environment in a single stream of act...
International audienceWe consider a reinforcement learning setting where the learner does not have e...
The reinforcement learning (RL) framework formalizes the notion of learning with interactions. Many ...
Leveraging an equivalence property on the set of states of state-action pairs in anMarkov Decision P...
In this paper, we give a brief review of Markov Decision Processes (MDPs), and how Reinforcement Lea...
We describe how to use robust Markov decision processes for value function ap-proximation with state...
International audienceWe consider the problem of online reinforcement learning when several state re...
In Reinforcement Learning (RL), regret guarantees scaling with the square root of the time horizon h...
In this paper, we revisit the regret of undiscounted reinforcement learning in MDPs with a birth and...
We address the problem of reinforcement learning in which observations may exhibit an arbitrary form...
The application of reinforcement learning (RL) algorithms is often hindered by the combinatorial exp...
Reinforcement learning (RL) studies the problem where an agent maximizes its cumulative reward throu...
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finit...
International audienceLeveraging an equivalence property in the state-space of a Markov Decision Pro...