We consider large-scale Markov decision processes (MDPs) with an unknown costfunction and employ stochastic convex optimization tools to address the problem of imitation learning, which consists of learning a policy from a finite set of expert demonstrations. We adopt the apprenticeship learning formalism, which carries the assumption that the true cost function can be represented as a linear combination of some known features. Existing inverse reinforcement learning algorithms come with strong theoretical guarantees, but are computationally expensive because they use reinforcement learning or planning algorithms as a subroutine. On the other hand state-of-the-art policy gradient based algorithms (like IM-REINFORCE, IM-TRPO and GAIL), achie...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
International audienceWe investigate a powerful nonconvex optimization approach based on Difference ...
We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In ...
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the ...
We consider learning in a Markov decision process where we are not explicitly given a reward functio...
This dissertation studies the applicability of convex optimization to the formal verification and sy...
We consider large-scale Markov decision processes with an unknown cost function and address the prob...
In this thesis we study several machine learning problems that are all linked with the minimization ...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
International audienceWe investigate a powerful nonconvex optimization approach based on Difference ...
We consider the applications of the Frank-Wolfe (FW) algorithm for Apprenticeship Learning (AL). In ...
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the ...
We consider learning in a Markov decision process where we are not explicitly given a reward functio...
This dissertation studies the applicability of convex optimization to the formal verification and sy...
We consider large-scale Markov decision processes with an unknown cost function and address the prob...
In this thesis we study several machine learning problems that are all linked with the minimization ...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
Increasing attention has been paid to reinforcement learning algorithms in recent years, partly due ...
We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We consider the problem of imitation learning where the examples, demonstrated by an expert, cover o...
We consider the problem of apprenticeship learning where the examples, demonstrated by an expert, co...
We study the problem of learning a policy in a Markov decision process (MDP) based on observations o...
This thesis is mostly focused on reinforcement learning, which is viewed as an optimization problem:...
International audienceWe investigate a powerful nonconvex optimization approach based on Difference ...