This paper introduces two novel algorithms for learning behaviors from human-provided rewards. The primary novelty of these algorithms is that instead of treating the feedback as a numeric reward signal, they interpret feedback as a form of discrete communication that depends on both the behavior the trainer is trying to teach and the teaching strategy used by the trainer. For example, some human trainers use a lack of feedback to indicate whether actions are correct or incorrect, and interpreting this lack of feedback accurately can significantly improve learning speed. Results from user studies show that humans use a variety of training strategies in practice and both algorithms can learn a contextual bandit task faster than algorithms th...
designed for interactive supervisory input from a human teacher, several works in both robot and sof...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
In this work, we address a relatively unexplored aspect of designing agents that learn from human re...
Abstract — In order to be useful in real-world situations, it is critical to allow non-technical use...
For agents and robots to become more useful, they must be able to quickly learn from non-technical u...
Learning from rewards generated by a human trainer observing an agent in action has proven to be a p...
Reactions such as gestures and facial expressions are an abundant, natural source of signal emitted ...
As robots become a mass consumer product, they will need to learn new skills by interacting with typ...
This paper is concerned with training an agent to perform sequential behavior. In previous work we h...
Learning from rewards generated by a human trainer observing an agent in action has been proven to b...
Thesis (Ph.D.), Computer Science, Washington State UniversityAs the number of deployed robots grows,...
AbstractWhile Reinforcement Learning (RL) is not traditionally designed for interactive supervisory ...
Teaching with evaluative feedback involves expectations about how a learner will interpret rewards a...
In order for a person to acquire the type of experiences that changes his behavior, the person needs...
A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback t...
designed for interactive supervisory input from a human teacher, several works in both robot and sof...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
In this work, we address a relatively unexplored aspect of designing agents that learn from human re...
Abstract — In order to be useful in real-world situations, it is critical to allow non-technical use...
For agents and robots to become more useful, they must be able to quickly learn from non-technical u...
Learning from rewards generated by a human trainer observing an agent in action has proven to be a p...
Reactions such as gestures and facial expressions are an abundant, natural source of signal emitted ...
As robots become a mass consumer product, they will need to learn new skills by interacting with typ...
This paper is concerned with training an agent to perform sequential behavior. In previous work we h...
Learning from rewards generated by a human trainer observing an agent in action has been proven to b...
Thesis (Ph.D.), Computer Science, Washington State UniversityAs the number of deployed robots grows,...
AbstractWhile Reinforcement Learning (RL) is not traditionally designed for interactive supervisory ...
Teaching with evaluative feedback involves expectations about how a learner will interpret rewards a...
In order for a person to acquire the type of experiences that changes his behavior, the person needs...
A long term goal of Interactive Reinforcement Learning is to incorporate non-expert human feedback t...
designed for interactive supervisory input from a human teacher, several works in both robot and sof...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
In this work, we address a relatively unexplored aspect of designing agents that learn from human re...