In this paper, we study the problem of efficient online reinforcement learning in the infinite horizon setting when there is an offline dataset to start with. We assume that the offline dataset is generated by an expert but with unknown level of competence, i.e., it is not perfect and not necessarily using the optimal policy. We show that if the learning agent models the behavioral policy (parameterized by a competence parameter) used by the expert, it can do substantially better in terms of minimizing cumulative regret, than if it doesn't do that. We establish an upper bound on regret of the exact informed PSRL algorithm that scales as $\tilde{O}(\sqrt{T})$. This requires a novel prior-dependent regret analysis of Bayesian online learning ...
We study online learnability of a wide class of problems, extending the results of [25] to general n...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an ...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an on...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
peer reviewedReinforcement learning (RL) was originally proposed as a framework to allow agents to l...
Much of modern learning theory has been split between two regimes: the classical offline setting, wh...
International audienceWe consider a reinforcement learning setting where the learner also has to dea...
We address the problem of Bayesian reinforcement learning using efficient model-based online plannin...
We study online learning in adversarial communicating Markov Decision Processes with full informatio...
Skills or low-level policies in reinforcement learning are temporally extended actions that can spee...
We study the problem of online learning in adversarial bandit problems under a partial observability...
We study online learnability of a wide class of problems, extending the results of [25] to general n...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an ...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an on...
We study the regret of reinforcement learning from offline data generated by a fixed behavior policy...
We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
peer reviewedReinforcement learning (RL) was originally proposed as a framework to allow agents to l...
Much of modern learning theory has been split between two regimes: the classical offline setting, wh...
International audienceWe consider a reinforcement learning setting where the learner also has to dea...
We address the problem of Bayesian reinforcement learning using efficient model-based online plannin...
We study online learning in adversarial communicating Markov Decision Processes with full informatio...
Skills or low-level policies in reinforcement learning are temporally extended actions that can spee...
We study the problem of online learning in adversarial bandit problems under a partial observability...
We study online learnability of a wide class of problems, extending the results of [25] to general n...
We study online reinforcement learning for finite-horizon deterministic control systems with arbitra...
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an ...