We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off. We propose an Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems. Theoretically, the designed algorithms based on the ARNPG framework achieve $\tilde{O}(1/T)$ global convergence with exact gradients. Empirically, the ARNPG-guided algorithms also demonstrate superior performan...
This paper is about learning a continuous approximation of the Pareto frontier in Multi–Objective Ma...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
In areas where prior data is not available, data scientists use reinforcement learning (RL) on onlin...
This paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Ma...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
International audienceMuch of the recent success of deep reinforcement learning has been driven by r...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
ICML 2019International audienceMany recent successful (deep) reinforcement learning algorithms make ...
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
We consider the problem of constrained Markov decision process (CMDP) in continuous state actions sp...
We propose Generalized Trust Region Policy Optimization (GTRPO), a policy gradient Reinforcement Lea...
We introduce a learning method called "gradient-based reinforcement planning" (GR...
Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing pa...
This paper is about learning a continuous approximation of the Pareto frontier in Multi–Objective Ma...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
In areas where prior data is not available, data scientists use reinforcement learning (RL) on onlin...
This paper is about learning a continuous approximation of the Pareto frontier in Multi-Objective Ma...
Natural policy gradient methods are popular reinforcement learning methods that improve the stabilit...
International audienceMuch of the recent success of deep reinforcement learning has been driven by r...
We present an in-depth survey of policy gradient methods as they are used in the machine learning co...
Reinforcement learning is a family of machine learning algorithms, in which the system learns to mak...
ICML 2019International audienceMany recent successful (deep) reinforcement learning algorithms make ...
A policy gradient method is a reinforcement learning approach that directly optimizes a parametrized...
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated applications ...
We consider the problem of constrained Markov decision process (CMDP) in continuous state actions sp...
We propose Generalized Trust Region Policy Optimization (GTRPO), a policy gradient Reinforcement Lea...
We introduce a learning method called "gradient-based reinforcement planning" (GR...
Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing pa...
This paper is about learning a continuous approximation of the Pareto frontier in Multi–Objective Ma...
We analyze the convergence rate of the unregularized natural policy gradient algorithm with log-line...
In areas where prior data is not available, data scientists use reinforcement learning (RL) on onlin...