We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the challenge of interactively learning about these constraints as a novel linear bandit problem which we call constrained linear best-arm identification. To solve this problem, we propose the Adaptive Constraint Learning (ACOL) algorithm. We provide an instance-dependent lower bound for constrained linear best-arm identification and show that ACOL's sample complexity matches the lower bound in the worst-case. In the average case, ACOL's sample complexity bound is still significantly tighter than bounds of s...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is ...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
Our goal is to efficiently learn reward functions encoding a human's preferences for how a dynamical...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control li...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
In interactive multi-objective reinforcement learning (MORL), an agent has to simultaneously learn a...
A fundamental challenge in interactive learning and decision making, ranging from bandit problems to...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
40 pages. AISTATS 2021, oralInternational audienceLogistic Bandits have recently attracted substanti...
Linear bandits have a wide variety of applications including recommendation systems yet they make on...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is ...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a ...
Our goal is to efficiently learn reward functions encoding a human's preferences for how a dynamical...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control li...
Model selection in the context of bandit optimization is a challenging problem, as it requires balan...
In interactive multi-objective reinforcement learning (MORL), an agent has to simultaneously learn a...
A fundamental challenge in interactive learning and decision making, ranging from bandit problems to...
We study human learning & decision-making in tasks with probabilistic rewards. Recent studies in...
Conveying complex objectives to reinforcement learning (RL) agents often requires meticulous reward ...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
40 pages. AISTATS 2021, oralInternational audienceLogistic Bandits have recently attracted substanti...
Linear bandits have a wide variety of applications including recommendation systems yet they make on...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
We provide the first algorithm for online bandit linear optimization whose regret after T rounds is ...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...