Inspired by advertising markets, we consider large-scale sequential decision making problems in which a learner must deploy an algorithm to behave optimally under uncertainty. Although many of these problems can be modeled as contextual bandit problems, we argue that the tools and techniques for analyzing bandit problems with large numbers of actions and contexts can be greatly expanded. While convexity and metric-similarity assumptions on the process generating rewards have yielded some algorithms in existing literature, certain types of assumptions that have been fruitful in offline supervised learning settings have yet to even be considered. Notably missing, for example, is any kind of graphical model approach to assuming structured rewa...
Cascading bandits is a natural and popular model that frames the task of learning to rank from Berno...
A lot of software systems today need to make real-time decisions to optimize an objective of interes...
This manuscript deals with the estimation of the optimal rule and its meanreward in a simple ban...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
This dissertation focuses on sequential learning and inference under unknown models. In this class o...
International audienceWe consider online bandit learning in which at every time step, an algorithm h...
We study the problem of decision-making under uncertainty in the bandit setting. This thesis goes be...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
We study structured multi-armed bandits, which is the problem of online decision-making under uncert...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
In performative prediction, the deployment of a predictive model triggers a shift in the data distri...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
University of Technology Sydney. Faculty of Engineering and Information Technology.The sequential de...
Cascading bandits is a natural and popular model that frames the task of learning to rank from Berno...
A lot of software systems today need to make real-time decisions to optimize an objective of interes...
This manuscript deals with the estimation of the optimal rule and its meanreward in a simple ban...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
Inspired by advertising markets, we consider large-scale sequential decision making problems in whic...
This dissertation focuses on sequential learning and inference under unknown models. In this class o...
International audienceWe consider online bandit learning in which at every time step, an algorithm h...
We study the problem of decision-making under uncertainty in the bandit setting. This thesis goes be...
We demonstrate a modification of the algorithm of Dani et al for the online linear optimization prob...
We study structured multi-armed bandits, which is the problem of online decision-making under uncert...
This thesis considers the multi-armed bandit (MAB) problem, both the traditional bandit feedback and...
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in t...
In performative prediction, the deployment of a predictive model triggers a shift in the data distri...
We study the attainable regret for online linear optimization problems with bandit feedback, where u...
University of Technology Sydney. Faculty of Engineering and Information Technology.The sequential de...
Cascading bandits is a natural and popular model that frames the task of learning to rank from Berno...
A lot of software systems today need to make real-time decisions to optimize an objective of interes...
This manuscript deals with the estimation of the optimal rule and its meanreward in a simple ban...