In interactive multi-objective reinforcement learning (MORL), an agent has to simultaneously learn about the environment and the preferences of the user, in order to quickly zoom in on those decisions that are likely to be preferred by the user. In this paper we study interactive MORL in the context of multi-objective multi-armed bandits. Contrary to earlier approaches to interactive MORL, we do not make stringent assumptions about the utility functions of the user, but allow for non-linear preferences. We propose a new approach called Gaussian-process Utility Thompson Sampling (GUTS), which employs non-parametric Bayesian learning to allow any type of utility function, exploits monotonicity information, and limits the number of queries pos...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochasti...
Multi-armed bandit is a widely-studied model for sequential decision-making problems. The most studi...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...
We propose an algorithmic framework for multi-objective multi-armed bandits with multiple rewards. D...
© 2020 IEEE. A common approach for defining a reward function for multi-objective reinforcement lear...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
In many real-world scenarios, the utility of a user is derived from the single execution of a policy...
We study sequential decision-making with known rewards and unknown constraints, motivated by situati...
How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, ...
Part 5: Machine LearningInternational audienceThe multi-armed bandit problem has been studied for de...
The field of deep reinforcement learning has seen major successes recently, achieving superhuman per...
In real-life decision environments people learn from their di-rect experience with alternative cours...
In the advent of Big Data and Machine Learning, there is a demand for improved decision making in un...
Multi-armed bandit, a popular framework for sequential decision-making problems, has recently gained...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochasti...
Multi-armed bandit is a widely-studied model for sequential decision-making problems. The most studi...
In multi-objective reinforcement learning (MORL), much attention is paid to generating optimal solut...
We propose an algorithmic framework for multi-objective multi-armed bandits with multiple rewards. D...
© 2020 IEEE. A common approach for defining a reward function for multi-objective reinforcement lear...
Abstract—We present a formal model of human decision-making in explore-exploit tasks using the conte...
In many real-world scenarios, the utility of a user is derived from the single execution of a policy...
We study sequential decision-making with known rewards and unknown constraints, motivated by situati...
How do humans search for rewards? This question is commonly studied using multi-armed bandit tasks, ...
Part 5: Machine LearningInternational audienceThe multi-armed bandit problem has been studied for de...
The field of deep reinforcement learning has seen major successes recently, achieving superhuman per...
In real-life decision environments people learn from their di-rect experience with alternative cours...
In the advent of Big Data and Machine Learning, there is a demand for improved decision making in un...
Multi-armed bandit, a popular framework for sequential decision-making problems, has recently gained...
How people achieve long-term goals in an imperfectly known environment, via repeated tries and noisy...
A multi-objective multi-armed bandit (MOMAB) problem is a sequential decision process with stochasti...
Multi-armed bandit is a widely-studied model for sequential decision-making problems. The most studi...