To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment the policy improvement step with a trust region constraint bounding the information loss. The size of the trust region is commonly determined by the Kullback-Leibler (KL) divergence, which not only captures the notion of distance well but also yields closed-form solutions. In this paper, we consider a more general class of f-divergences and derive the corresponding policy update rules. The generic solution is expressed through the derivative of the convex conjugate function to f and includes the KL solution as a special case. Within the class of f-divergences, we further focus on a one-parameter family of α-divergences to study effects of the c...
Many of the recent trajectory optimization algorithms alternate between linear approximation of the ...
We address problems with a recent attempt by Eva, Hartmann and Rad to justify the Kullback-Leibler d...
In this work, the probability of an event under some joint distribution is bounded by measuring it w...
To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment th...
First-order gradient descent is to date the most commonly used optimization method for training deep...
Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) d...
An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthe...
We present an analytical policy update rule that is independent of parametric function approximators...
Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain target dis...
Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to...
As a distributed learning paradigm, Federated Learning (FL) faces the communication bottleneck issue...
This paper introduces the $f$-EI$(\phi)$ algorithm, a novel iterative algorithm which operates on me...
The recent remarkable progress of deep reinforcement learning (DRL) stands on regularization of poli...
International audienceIn the context of learning deterministic policies in continuous domains, we re...
17 pagesWe propose new change of measure inequalities based on $f$-divergences (of which the Kullbac...
Many of the recent trajectory optimization algorithms alternate between linear approximation of the ...
We address problems with a recent attempt by Eva, Hartmann and Rad to justify the Kullback-Leibler d...
In this work, the probability of an event under some joint distribution is bounded by measuring it w...
To ensure stability of learning, state-of-the-art generalized policy iteration algorithms augment th...
First-order gradient descent is to date the most commonly used optimization method for training deep...
Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) d...
An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthe...
We present an analytical policy update rule that is independent of parametric function approximators...
Many policy search algorithms minimize the Kullback-Leibler (KL) divergence to a certain target dis...
Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to...
As a distributed learning paradigm, Federated Learning (FL) faces the communication bottleneck issue...
This paper introduces the $f$-EI$(\phi)$ algorithm, a novel iterative algorithm which operates on me...
The recent remarkable progress of deep reinforcement learning (DRL) stands on regularization of poli...
International audienceIn the context of learning deterministic policies in continuous domains, we re...
17 pagesWe propose new change of measure inequalities based on $f$-divergences (of which the Kullbac...
Many of the recent trajectory optimization algorithms alternate between linear approximation of the ...
We address problems with a recent attempt by Eva, Hartmann and Rad to justify the Kullback-Leibler d...
In this work, the probability of an event under some joint distribution is bounded by measuring it w...