The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments, a type of reinforcement learning from human feedback (RLHF). These human preferences are typically assumed to be informed solely by partial return, the sum of rewards along each segment. We find this assumption to be flawed and propose modeling human preferences instead as informed by each segment's regret, a measure of a segment's deviation from optimal decision-making. Given infinitely many preferences generated according to regret, we prove that we can identify a reward functi...
In preference-based reinforcement learning (RL), an agent interacts with the environment while recei...
Practical implementations of deep reinforcement learning (deep RL) have been challenging due to an a...
In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in...
Deploying learning systems in the real-world requires aligning their objectives with those of the hu...
We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
The current reward learning from human preferences could be used to resolve complex reinforcement le...
While reinforcement learning has led to promising results in robotics, defining an informative rewar...
Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chose...
We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional...
When inferring reward functions from human behavior (be it demonstrations, comparisons, physical cor...
Common reinforcement learning algorithms assume access to a numeric feedback signal. The numeric fee...
Inferring reward functions from human behavior is at the center of value alignment - aligning AI obj...
Reinforcement learning from human feedback (RLHF) is effective at aligning large language models (LL...
Reward discounting has become an indispensable ingredient in designing practical reinforcement learn...
In preference-based reinforcement learning (RL), an agent interacts with the environment while recei...
Practical implementations of deep reinforcement learning (deep RL) have been challenging due to an a...
In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in...
Deploying learning systems in the real-world requires aligning their objectives with those of the hu...
We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
The current reward learning from human preferences could be used to resolve complex reinforcement le...
While reinforcement learning has led to promising results in robotics, defining an informative rewar...
Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a suitably chose...
We consider the problem of preference based reinforcement learning (PbRL), where, unlike traditional...
When inferring reward functions from human behavior (be it demonstrations, comparisons, physical cor...
Common reinforcement learning algorithms assume access to a numeric feedback signal. The numeric fee...
Inferring reward functions from human behavior is at the center of value alignment - aligning AI obj...
Reinforcement learning from human feedback (RLHF) is effective at aligning large language models (LL...
Reward discounting has become an indispensable ingredient in designing practical reinforcement learn...
In preference-based reinforcement learning (RL), an agent interacts with the environment while recei...
Practical implementations of deep reinforcement learning (deep RL) have been challenging due to an a...
In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in...