Contextual bandits are canonical models for sequential decision-making under uncertainty in environments with time-varying components. In this setting, the expected reward of each bandit arm consists of the inner product of an unknown parameter with the context vector of that arm. The classical bandit settings heavily rely on assuming that the contexts are fully observed, while study of the richer model of imperfectly observed contextual bandits is immature. This work considers Greedy reinforcement learning policies that take actions as if the current estimates of the parameter and of the unobserved contexts coincide with the corresponding true values. We establish that the non-asymptotic worst-case regret grows poly-logarithmically with th...
In the problem of (binary) contextual bandits with knapsacks (CBwK), the agent receives an i.i.d. co...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
We propose a new sequential decision-making setting, combining key aspects of two established online...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as ...
In this paper, we investigate the stochastic contextual bandit with general function space and graph...
The data explosion and development of artificial intelligence (AI) has fueled the demand for recomme...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
40 pages. AISTATS 2021, oralInternational audienceLogistic Bandits have recently attracted substanti...
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to l...
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 com...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
In the problem of (binary) contextual bandits with knapsacks (CBwK), the agent receives an i.i.d. co...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
We propose a new sequential decision-making setting, combining key aspects of two established online...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Abstract. Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the...
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as ...
In this paper, we investigate the stochastic contextual bandit with general function space and graph...
The data explosion and development of artificial intelligence (AI) has fueled the demand for recomme...
We consider a special case of bandit problems, named batched bandits, in which an agent observes bat...
40 pages. AISTATS 2021, oralInternational audienceLogistic Bandits have recently attracted substanti...
Imitation learning has been widely used to speed up learning in novice agents, by allowing them to l...
We derive an instantaneous (per-round) data-dependent regret bound for stochastic multiarmed bandits...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 com...
We consider the restless multi-armed bandit (RMAB) problem with unknown dynamics in which a...
In the problem of (binary) contextual bandits with knapsacks (CBwK), the agent receives an i.i.d. co...
This thesis investigates sequential decision making tasks that fall in the framework of reinforcemen...
We propose a new sequential decision-making setting, combining key aspects of two established online...