Our goal is to efficiently learn reward functions encoding a human's preferences for how a dynamical system should act. There are two challenges with this. First, in many problems it is difficult for people to provide demonstrations of the desired system trajectory (like a high-DOF robot arm motion or an aggressive driving maneuver), or to even assign how much numerical reward an action or trajectory should get. We build on work in label ranking and propose to learn from preferences (or comparisons) instead: the person provides the system a relative preference between two trajectories. Second, the learned reward function strongly depends on what environments and trajectories were experienced during the training phase. We thus take an active...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
The field of deep reinforcement learning has seen major successes recently, achieving superhuman per...
We consider the problem of learning good trajectories for manipulation tasks. This is challenging be...
Abstract—While reward functions are an essential component of many robot learning methods, defining ...
This work tackles in-situ robotics: the goal is to learn a policy while the robot operates in the re...
Human-in-the-loop reinforcement learning (RL) methods actively integrate human knowledge to create r...
International audienceThis paper focuses on reinforcement learning (RL) with limited prior knowledge...
Abstract. This paper focuses on reinforcement learning (RL) with lim-ited prior knowledge. In the do...
Abstract. This paper focuses on reinforcement learning (RL) with lim-ited prior knowledge. In the do...
This paper makes a first step toward the integration of two subfields of machine learning, namely pr...
Abstract Reward functions are an essential compo-nent of many robot learning methods. Defining such ...
Typically when learning about what people want and don't want, we look to human action as evidence: ...
Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tun...
With progress in enabling autonomous cars to drive safely on the road, it is time to ask how should ...
This thesis studies a central problem in human-robot interaction (HRI): How can non-expert users spe...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
The field of deep reinforcement learning has seen major successes recently, achieving superhuman per...
We consider the problem of learning good trajectories for manipulation tasks. This is challenging be...
Abstract—While reward functions are an essential component of many robot learning methods, defining ...
This work tackles in-situ robotics: the goal is to learn a policy while the robot operates in the re...
Human-in-the-loop reinforcement learning (RL) methods actively integrate human knowledge to create r...
International audienceThis paper focuses on reinforcement learning (RL) with limited prior knowledge...
Abstract. This paper focuses on reinforcement learning (RL) with lim-ited prior knowledge. In the do...
Abstract. This paper focuses on reinforcement learning (RL) with lim-ited prior knowledge. In the do...
This paper makes a first step toward the integration of two subfields of machine learning, namely pr...
Abstract Reward functions are an essential compo-nent of many robot learning methods. Defining such ...
Typically when learning about what people want and don't want, we look to human action as evidence: ...
Specifying a numeric reward function for reinforcement learning typically requires a lot of hand-tun...
With progress in enabling autonomous cars to drive safely on the road, it is time to ask how should ...
This thesis studies a central problem in human-robot interaction (HRI): How can non-expert users spe...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
The field of deep reinforcement learning has seen major successes recently, achieving superhuman per...
We consider the problem of learning good trajectories for manipulation tasks. This is challenging be...