In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. Instead, we observe trajectories sampled by an expert that acts according to some policy. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We introduce an online variant of AL (Online Apprenticeship Learning; OAL), where the agent is expected to perform comparably to the expert while interacting with the environment. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms: one for policy optimization and another for learning the worst case cost. By employing optimistic exploration, we derive a convergent algori...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
One of the fundamental problems of artificial intelligence is learning how to behave optimally. With...
We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In ...
We consider large-scale Markov decision processes (MDPs) with an unknown costfunction and employ sto...
In this paper we consider online learning in fi-nite Markov decision processes (MDPs) with changing ...
lille1.fr This paper deals with the problem of learning from demon-strations, where an agent called ...
International audienceThis paper addresses the problem of apprenticeship learning, that is learning ...
International audienceThis paper addresses the problem of apprenticeship learning, that is learning ...
International audienceThis paper deals with the problem of learning from demonstrations, where an ag...
We consider learning in a Markov decision process where we are not explicitly given a reward functio...
We study the problem of online learning Markov Decision Processes (MDPs) when both the transition di...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
This paper develops a generalized appren-ticeship learning protocol for reinforcement-learning agent...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
One of the fundamental problems of artificial intelligence is learning how to behave optimally. With...
We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In ...
We consider large-scale Markov decision processes (MDPs) with an unknown costfunction and employ sto...
In this paper we consider online learning in fi-nite Markov decision processes (MDPs) with changing ...
lille1.fr This paper deals with the problem of learning from demon-strations, where an agent called ...
International audienceThis paper addresses the problem of apprenticeship learning, that is learning ...
International audienceThis paper addresses the problem of apprenticeship learning, that is learning ...
International audienceThis paper deals with the problem of learning from demonstrations, where an ag...
We consider learning in a Markov decision process where we are not explicitly given a reward functio...
We study the problem of online learning Markov Decision Processes (MDPs) when both the transition di...
We consider the problem of minimizing the long term average expected regret of an agent in an online...
This paper develops a generalized appren-ticeship learning protocol for reinforcement-learning agent...
Reinforcement learning (RL) has gained an increasing interest in recent years, being expected to del...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
One of the fundamental problems of artificial intelligence is learning how to behave optimally. With...