Deploying learning systems in the real-world requires aligning their objectives with those of the humans they interact with. Existing algorithmic approaches for this alignment try to infer these objectives through human feedback. The correctness of these algorithms crucially depends on several simplifying assumptions on 1) how humans represent these objectives, 2) how humans respond to queries given these objectives, and 3) how well the hypothesis space represents these objectives. In this thesis, we question the robustness of existing approaches to misspecifications in these assumptions and develop principled approaches to overcome such misspecifications. We begin by studying misspecifications in the hypothesis class assumed by the learner...
The current reward learning from human preferences could be used to resolve complex reinforcement le...
Humans can leverage physical interaction to teach robot arms. This physical interaction takes multip...
Common reinforcement learning algorithms assume access to a numeric feedback signal. The numeric fee...
Deploying learning systems in the real-world requires aligning their objectives with those of the hu...
The utility of reinforcement learning is limited by the alignment of reward functions with the inter...
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function R from a policy pi. To...
When inferring reward functions from human behavior (be it demonstrations, comparisons, physical cor...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
Practical implementations of deep reinforcement learning (deep RL) have been challenging due to an a...
Objective: In evaluating our choices, we often suffer from two tragic relativities. First, when our ...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance...
Inferring reward functions from human behavior is at the center of value alignment - aligning AI obj...
We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference...
While reinforcement learning has led to promising results in robotics, defining an informative rewar...
The current reward learning from human preferences could be used to resolve complex reinforcement le...
Humans can leverage physical interaction to teach robot arms. This physical interaction takes multip...
Common reinforcement learning algorithms assume access to a numeric feedback signal. The numeric fee...
Deploying learning systems in the real-world requires aligning their objectives with those of the hu...
The utility of reinforcement learning is limited by the alignment of reward functions with the inter...
The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function R from a policy pi. To...
When inferring reward functions from human behavior (be it demonstrations, comparisons, physical cor...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
Practical implementations of deep reinforcement learning (deep RL) have been challenging due to an a...
Objective: In evaluating our choices, we often suffer from two tragic relativities. First, when our ...
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions t...
In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance...
Inferring reward functions from human behavior is at the center of value alignment - aligning AI obj...
We define a novel neuro-symbolic framework, argumentative reward learning, which combines preference...
While reinforcement learning has led to promising results in robotics, defining an informative rewar...
The current reward learning from human preferences could be used to resolve complex reinforcement le...
Humans can leverage physical interaction to teach robot arms. This physical interaction takes multip...
Common reinforcement learning algorithms assume access to a numeric feedback signal. The numeric fee...