We present Reward-Switching Policy Optimization (RSPO), a paradigm to discover diverse strategies in complex RL environments by iteratively finding novel policies that are both locally optimal and sufficiently different from existing ones. To encourage the learning policy to consistently converge towards a previously undiscovered local optimum, RSPO switches between extrinsic and intrinsic rewards via a trajectory-based novelty measurement during the optimization process. When a sampled trajectory is sufficiently distinct, RSPO performs standard policy optimization with extrinsic rewards. For trajectories with high likelihood under existing policies, RSPO utilizes an intrinsic diversity reward to promote exploration. Experiments show that R...
While reinforcement learning algorithms provide automated acquisition of optimal policies, practical...
We propose a method for meta-learning reinforcement learning algorithms by searching over the space ...
Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle r...
A fascinating aspect of nature lies in its ability to produce a large and diverse collection of orga...
Efficient exploration is important for reinforcement learners to achieve high rewards. In multi-agen...
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little t...
We introduce a new constrained optimization method for policy gradient reinforcement learning, which...
Performance, generalizability, and stability are three Reinforcement Learning (RL) challenges releva...
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-...
To convey desired behavior to a Reinforcement Learning (RL) agent, a designer must choose a reward f...
Despite the increasing popularity of policy gradient methods, they are yet to be widely utilized in ...
Recent algorithms designed for reinforcement learning tasks focus on finding a single optimal soluti...
Reinforcement learning (RL) aims to learn optimal behaviors for agents to maximize cumulative reward...
A reward mechanism is critical for a Reinforcement Learning agent to learn action policies from rewa...
Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solvi...
While reinforcement learning algorithms provide automated acquisition of optimal policies, practical...
We propose a method for meta-learning reinforcement learning algorithms by searching over the space ...
Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle r...
A fascinating aspect of nature lies in its ability to produce a large and diverse collection of orga...
Efficient exploration is important for reinforcement learners to achieve high rewards. In multi-agen...
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little t...
We introduce a new constrained optimization method for policy gradient reinforcement learning, which...
Performance, generalizability, and stability are three Reinforcement Learning (RL) challenges releva...
We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-...
To convey desired behavior to a Reinforcement Learning (RL) agent, a designer must choose a reward f...
Despite the increasing popularity of policy gradient methods, they are yet to be widely utilized in ...
Recent algorithms designed for reinforcement learning tasks focus on finding a single optimal soluti...
Reinforcement learning (RL) aims to learn optimal behaviors for agents to maximize cumulative reward...
A reward mechanism is critical for a Reinforcement Learning agent to learn action policies from rewa...
Reinforcement Learning has drawn huge interest as a tool for solving optimal control problems. Solvi...
While reinforcement learning algorithms provide automated acquisition of optimal policies, practical...
We propose a method for meta-learning reinforcement learning algorithms by searching over the space ...
Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle r...