The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, the stringent requirement for immediate rewards is unmet in many real-world applications where the reward is almost always delayed. We study the phenomenon of delayed rewards in generalised linear bandits in a theoretical manner. We show that a natural adaptation of an optimistic algorithm to the delayed feedback setting can achieve regret of ̃O(d√T + d3/2E[τ ] ), where E[τ ] denotes the expected delay, d is the dimension and T is the time horizon. This significantly improves upon existing approaches for this setting where the best known ...
University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 com...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
In the classical stochastic k-armed bandit problem, in each of a sequence of T rounds, a decision ma...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
We study a variant of the stochastic K-armed bandit problem, which we call "bandits with delayed, ag...
We study a variant of the stochastic K-armed bandit problem, which we call “bandits with delayed, ag...
We investigate a nonstochastic bandit setting in which the loss of an action is not immediately char...
We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical ...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
In many real-world sequential decision-making problems, an action does not immediately reflect on th...
We propose Banker-OMD, a novel framework generalizing the classical Online Mirror Descent (OMD) tech...
International audienceWe consider a stochastic linear bandit model in which the available actions c...
We propose `Banker-OMD`, a novel framework generalizing the classical Online Mirror Descent (OMD) te...
International audienceMotivated by applications to online advertising and recommender systems, we co...
University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 com...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
In the classical stochastic k-armed bandit problem, in each of a sequence of T rounds, a decision ma...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
The stochastic generalised linear bandit is a well-understood model for sequential decision-making p...
We study a variant of the stochastic K-armed bandit problem, which we call "bandits with delayed, ag...
We study a variant of the stochastic K-armed bandit problem, which we call “bandits with delayed, ag...
We investigate a nonstochastic bandit setting in which the loss of an action is not immediately char...
We investigate multiarmed bandits with delayed feedback, where the delays need neither be identical ...
This paper explores a new form of the linear bandit problem in which the algorithm receives the usua...
In many real-world sequential decision-making problems, an action does not immediately reflect on th...
We propose Banker-OMD, a novel framework generalizing the classical Online Mirror Descent (OMD) tech...
International audienceWe consider a stochastic linear bandit model in which the available actions c...
We propose `Banker-OMD`, a novel framework generalizing the classical Online Mirror Descent (OMD) te...
International audienceMotivated by applications to online advertising and recommender systems, we co...
University of Minnesota Ph.D. dissertation. May 2020. Major: Statistics. Advisor: Yuhong Yang. 1 com...
In X-armed bandit problem an agent sequentially interacts with environment which yields a reward bas...
In the classical stochastic k-armed bandit problem, in each of a sequence of T rounds, a decision ma...